Dagster Flashcards

๐Ÿงฉ Dagster Flashcards

Data orchestrator for building, scheduling, and observing data assets

๐Ÿ’ก What is Dagster?

An open-source data orchestrator focused on assets, reliability, and developer productivity for data platforms.

๐Ÿงฑ Software-Defined Assets (SDAs)

Declare data assets as first-class objects with code-defined dependencies, metadata, and lineage.

โš™๏ธ Ops & Jobs

Ops are reusable computation units. Jobs assemble ops/assets into executable graphs.

๐Ÿงฐ Resources

Inject external systems (DBs, APIs, warehouses) as typed resources for clean, testable I/O.

๐Ÿ—‚๏ธ Partitions & Backfills

Partition assets by time or keys, run selective backfills, and track progress with granular observability.

๐Ÿ“ฆ IO Managers

Pluggable storage for passing data between ops/assets (e.g., files, object stores, dataframes, tables).

โฐ Schedules & Sensors

Trigger jobs on cron-like schedules or react to external events (new files, table updates, custom signals).

๐Ÿงช Type System & Config

Typed inputs/outputs, schema-validated run config, and rich metadata for runtime safety and clarity.

๐Ÿ”ญ Observability

Asset materializations, checks, run logs, and lineage views in the UI for debugging and governance.

๐Ÿ–ฅ๏ธ Dagster UI

Visualize graphs, kick off runs, watch logs, inspect partitions, and manage schedules/sensors from the web UI.

๐Ÿ”— Integrations

Works with dbt, Spark, Pandas/Polars, Snowflake, BigQuery, Redshift, Airbyte/Fivetran, Kafka, and more.

๐Ÿš€ Deployment

Run locally, on Kubernetes, with Dagster Cloud (managed), or hybrid. CI/CD-friendly with code locations.