๐งฉ Dagster Flashcards
Data orchestrator for building, scheduling, and observing data assets
๐ก What is Dagster?
An open-source data orchestrator focused on assets, reliability, and developer productivity for data platforms.
๐งฑ Software-Defined Assets (SDAs)
Declare data assets as first-class objects with code-defined dependencies, metadata, and lineage.
โ๏ธ Ops & Jobs
Ops are reusable computation units. Jobs assemble ops/assets into executable graphs.
๐งฐ Resources
Inject external systems (DBs, APIs, warehouses) as typed resources for clean, testable I/O.
๐๏ธ Partitions & Backfills
Partition assets by time or keys, run selective backfills, and track progress with granular observability.
๐ฆ IO Managers
Pluggable storage for passing data between ops/assets (e.g., files, object stores, dataframes, tables).
โฐ Schedules & Sensors
Trigger jobs on cron-like schedules or react to external events (new files, table updates, custom signals).
๐งช Type System & Config
Typed inputs/outputs, schema-validated run config, and rich metadata for runtime safety and clarity.
๐ญ Observability
Asset materializations, checks, run logs, and lineage views in the UI for debugging and governance.
๐ฅ๏ธ Dagster UI
Visualize graphs, kick off runs, watch logs, inspect partitions, and manage schedules/sensors from the web UI.
๐ Integrations
Works with dbt, Spark, Pandas/Polars, Snowflake, BigQuery, Redshift, Airbyte/Fivetran, Kafka, and more.
๐ Deployment
Run locally, on Kubernetes, with Dagster Cloud (managed), or hybrid. CI/CD-friendly with code locations.