This device is not compatible.

Mastering Airflow: Building an ETL Pipeline

PROJECT


Mastering Airflow: Building an ETL Pipeline

In this project, we’ll learn how to integrate ETL code into Airflow, configure a DAG, and leverage some features of Airflow to make the code flexible and adaptive to user-defined parameters.

Mastering Airflow: Building an ETL Pipeline

You will learn to:

Design and implement scalable data collection workflows with Airflow.

Integrate diverse data sources into Airflow pipelines.

Understand Apache Airflow fundamentals for data pipeline orchestration.

Build end-to-end data pipelines with Airflow.

Store and organize data in a data lake using Airflow DAGs.

Schedule and manage daily data pipelines using Airflow.

Skills

Data Collection

Data Cleaning

Data Engineering

Data Pipeline Engineering

Task Automation

Prerequisites

Proficiency in Python programming language

Understanding of ETL processes and data pipeline concepts

Fundamentals of data pipelines

Familiarity with Airflow

Technologies

Python

Pandas

Apache Airflow logo

Apache Airflow

Project Description

Data collection is a common task for data professionals (analysts, scientists, and engineers). To complete this process efficiently, it’s important to automate, scale, and manage it; Airflow helps with that. It’s an open-source platform that allows users to manage scheduled pipelines. It can be deployed to cloud environments, handle different programming languages, and integrate with several data sources. 

In this project, we’ll collect data from different sources, store it in a structure similar to a data lake, and organize it into daily Airflow pipelines.

Project Tasks

1

Introduction

Task 0: Get Started

2

ETL of the Snapshot Data

Task 1: Collect the Snapshot Data

Task 2: Save the Data in the Raw Folder

Task 3: Transfer the Data to the Refined Folder

3

ETL of the Time-Based Data

Task 4: Collect the Time-Based Data

Task 5: Save the Data to the Raw Folder

Task 6: Transfer the Data to the Refined Folder

4

Leverage Your Solution with Airflow

Task 7: Sign into Airflow

Task 8: Build the First DAG

Task 9: Add the Snapshot Data to DAG

Task 10: Add Parameters to DAG

Task 11: Optimize the Snapshot Data Collection

Task 12: Add the Time-Based Data to DAG

Task 13: Identify the Missing Dates of the Stock

Task 14: Fill the Missing Stock Data

5

Advanced Configurations

Task 15: Add Variables

Task 16: Control Access to the DAGs

Congratulations!

has successfully completed the Guided ProjectMastering Airflow: Building an ETL Pipeline

Relevant Courses

Use the following content to review prerequisites or explore specific concepts in detail.