๐ซ Apache Airflow Flashcards
๐ซ What is Apache Airflow?
Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows (DAGs).
๐
What is a DAG?
A DAG (Directed Acyclic Graph) represents a workflow where tasks are arranged with dependencies and run in sequence or parallel.
โ๏ธ What are Operators?
Operators define the type of work to be done. Examples: BashOperator, PythonOperator, EmailOperator, etc.
๐ฆ What is a Task?
A Task is a single unit of execution in a DAG, created using Operators and configured with parameters.
๐ What are Sensors?
Sensors are special operators that wait for a condition to be true before running downstream tasks.
๐๏ธ What is a Schedule Interval?
Defines how often a DAG should run. Can be cron expressions or presets like `@daily`, `@hourly`, etc.
๐ฅ๏ธ What is the Airflow Web UI?
A rich web interface for monitoring DAGs, viewing logs, triggering tasks, and managing configurations.
๐ What is a Task Instance?
A Task Instance is a specific run of a task for a particular DAG run, with a unique execution date.
๐ What is Task Retry?
Airflow can retry failed tasks a specified number of times with a configurable delay using `retries` and `retry_delay` params.
๐ What is XCom?
XCom (Cross Communication) is used for sharing small pieces of data between tasks in a DAG.
๐ How does Airflow handle authentication?
Airflow supports multiple auth backends like LDAP, OAuth, or password-based auth using Flask AppBuilder.
๐งช How to test DAGs?
Use `airflow dags test` command or write unit tests to simulate task execution locally without the scheduler.