gcp_auth

application_default_credentials.json from GCP SDK installation

project_id

Gain insights into data engineering foundations, explore data life cycle stages, and delve into creating data pipelines using Python, Kafka, PySpark, Airflow, and dbt.

UDoc.tar.gz

BigQuery

Airflow

Airflow Custom

Dagster

Airflow-copy

Airflow-custom-copy

Data engineering is currently one of the most in-demand fields in data and technology. It intersects software engineering, DataOps, data architecture, data management, and security. Data engineers, such as analysts and data scientists, lay the foundation to serve data for consumers. 

In this course, you will learn the foundation of data engineering, covering different parts of the entire data life cycle: data warehouse, ingestion, transformation, orchestration, etc. You will also gain hands-on experience building data pipelines using different techniques such as Python, Kafka, PySpark, Airflow, dbt, and more. 

By the end of this course, you will have a holistic understanding of data engineering and be able to build your data pipelines to serve data for various consumers.

Data Engineering Foundations in Python

Learn how to do real-time ingestion using streaming platforms.

Getting Started

Data Team Structure

Data Engineering Life Cycle

Cloud Data Architecture

Data Ingestion

Data Modeling

Data Orchestration

Mastering Airflow: Building an ETL Pipeline

Data Quality

Build an End-to-End Data Pipeline for Formula 1 Analysis

Epilogue

Appendix

Ingestion Methods—Streaming Platform

Apache Kafka