Apache Airflow Flashcards

πŸ›« Apache Airflow Flashcards
πŸ›« What is Apache Airflow?
Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows (DAGs).

πŸ“… What is a DAG?
A DAG (Directed Acyclic Graph) represents a workflow where tasks are arranged with dependencies and run in sequence or parallel.

βš™οΈ What are Operators?
Operators define the type of work to be done. Examples: BashOperator, PythonOperator, EmailOperator, etc.

πŸ“¦ What is a Task?
A Task is a single unit of execution in a DAG, created using Operators and configured with parameters.

πŸ” What are Sensors?
Sensors are special operators that wait for a condition to be true before running downstream tasks.

πŸ—“οΈ What is a Schedule Interval?
Defines how often a DAG should run. Can be cron expressions or presets like `@daily`, `@hourly`, etc.

πŸ–₯️ What is the Airflow Web UI?
A rich web interface for monitoring DAGs, viewing logs, triggering tasks, and managing configurations.

πŸ“‚ What is a Task Instance?
A Task Instance is a specific run of a task for a particular DAG run, with a unique execution date.

πŸ”„ What is Task Retry?
Airflow can retry failed tasks a specified number of times with a configurable delay using `retries` and `retry_delay` params.

πŸ“Œ What is XCom?
XCom (Cross Communication) is used for sharing small pieces of data between tasks in a DAG.

πŸ” How does Airflow handle authentication?
Airflow supports multiple auth backends like LDAP, OAuth, or password-based auth using Flask AppBuilder.

πŸ§ͺ How to test DAGs?
Use `airflow dags test` command or write unit tests to simulate task execution locally without the scheduler.