ETL Pipeline Example: Airflow Extraction Task
Learn to add the extract function to an Airflow DAG.
We'll cover the following
After completing the previous task, we now have a function called extract
for extracting the latest batch of data from the production database in a file called helper.py
. We’ll add this function as a task in an airflow DAG to complete the extract step.
This is how we build our pipelines using airflow, by adding more and more tasks to a particular DAG.
Airflow operators
Before showing how to do that, let’s discuss airflow operators. In Airflow, operators are the building blocks that define tasks in a pipeline. Each operator is different and can be used to perform a particular type of task, such as executing a SQL query, transferring files, running a Python script, etc. Some common operators are:
BashOperator
: Used to execute Bash commands or a scriptPythonOperator
: Used to execute Python commands or a scriptHttpSensor
: Pokes an HTTP service until a response is receivedPostgresOperator
: Used to interact with a PostgreSQL database
Each task in an Airflow DAG is executed by some type of operator. An ETL pipeline can be built by using a combination of different operators working in sequence or parallel. For example, an ETL pipeline built using Airflow operators can look like this:
Get hands-on with 1400+ tech skills courses.