Airflow DAG Design

Learn how to design a DAG and its best practices.

To create DAGs, we just need basic knowledge of Python. However, to create efficient and scalable DAGs, it's essential to master Airflow's specific features and nuances. This lesson will guide us through the process of building DAGs using advanced Airflow features to achieve optimal performance and functionality. This lesson uses Airflow version 2.6.

Create a DAG object

A DAG file starts with a dag object. We can create a dag object using a context manager or a decorator. Examples of this lesson are available in the "Demo" section.

Press + to interact
from airflow.decorators import dag
from airflow import DAG
import pendulum
# dag1 - using @dag decorator
@dag(
schedule="30 4 * * *",
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
tags=["educative"]
)
def educative_dag1():
pass
educative_dag1()
# dag2 - using context manager
with DAG(
dag_id="educative_dag2",
schedule="@daily",
start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
catchup=False,
tags=["educative"]
) as dag2:
pass

Either way, we need to define a few parameters to control how a DAG is supposed to run. Some of the most-used parameters are:

  • start_date: If it's a future date, it's the timestamp when the scheduler starts to run. If it's a past date, it's the timestamp from which the scheduler will attempt to backfill.

  • catch_up: Whether to perform scheduler catch-up. If set to true, the scheduler will backfill runs from the start date.

  • schedule: Scheduling rules. Currently, it accepts a cron string, time delta object, timetable, or list of dataset objects.

  • tags: List of tags helping us search DAGs in the UI.

Create a task object

A DAG object is composed of a series of dependent tasks. A task can be an operator, a sensor, or ...