Airflow DAG Design
Learn how to design a DAG and its best practices.
To create DAGs, we just need basic knowledge of Python. However, to create efficient and scalable DAGs, it's essential to master Airflow's specific features and nuances. This lesson will guide us through the process of building DAGs using advanced Airflow features to achieve optimal performance and functionality. This lesson uses Airflow version 2.6.
Create a DAG object
A DAG file starts with a dag
object. We can create a dag
object using a context manager or a decorator. Examples of this lesson are available in the "Demo" section.
from airflow.decorators import dagfrom airflow import DAGimport pendulum# dag1 - using @dag decorator@dag(schedule="30 4 * * *",start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),catchup=False,tags=["educative"])def educative_dag1():passeducative_dag1()# dag2 - using context managerwith DAG(dag_id="educative_dag2",schedule="@daily",start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),catchup=False,tags=["educative"]) as dag2:pass
Either way, we need to define a few parameters to control how a DAG is supposed to run. Some of the most-used parameters are:
start_date
: If it's a future date, it's the timestamp when the scheduler starts to run. If it's a past date, it's the timestamp from which the scheduler will attempt to backfill.catch_up
: Whether to perform scheduler catch-up. If set to true, the scheduler will backfill runs from the start date.schedule
: Scheduling rules. Currently, it accepts a cron string, time delta object, timetable, or list of dataset objects.tags
: List of tags helping us search DAGs in the UI.
Create a task object
A DAG object is composed of a series of dependent tasks. A task can be an operator, a sensor, or ...