Getting Started

Learn about the general prerequisites of this course and what it offers to its learners.

In this course, we’ll get hands-on experience building ETL pipelines for transferring data for business intelligence (BI) and analytics.

Who should take this course?

ETL is crucial for BI and analytics. Today's organizations and companies are data-driven. They place a high value on making strategic decisions based on data analysis and overall BI.

The ability to build ETL pipelines to transfer data is valuable to organizations. It provides the organization with a supply of clean data that can be analyzed and ultimately offers better customer and consumer service. A typical ETL process combines many tools, languages, and techniques. It's a whole world and an excellent way of practicing building systems that combine multiple components.

Anyone who aspires to be one or is already an active data engineer, data scientist, data analyst, or developer or is interested in creating data pipelines for serving data warehouses and data analytics platforms using the ETL paradigm can benefit from this course.

Prerequisites

Throughout this course, we'll use various tools and languages to complete the steps in the ETL process. These might include:

  • Shell scripting languages like Bash

  • SQL database solutions such as MySQL or PostgreSQL

  • NoSQL database solutions like MongoDB

  • Programming languages like Python and Apache Spark

  • Data orchestration tools like Apache Airflow

While familiarity with these tools is recommended, please feel free to use any tools to complete the tasks. Actively engaging in the work, modifying the code, or coding your solutions is very much encouraged. Remember, we’re problem-solvers first and foremost. The tool itself doesn't matter; what matters is to get the job done in the best possible way and in the least amount of time.

Course structure

This course is divided into five sections, and each section focuses on a particular topic. Here’s a brief overview of what to expect in each section.

Introduction

We’ll begin by briefly introducing each stage in the ETL process. We’ll then define ETL pipelines and data pipelines in general and explore the fundamental principles and techniques behind building ETL pipelines. We’ll learn about the tools, techniques, and everyday use cases for the two main ETL paradigms: batch and streaming pipelines.

Next, we’ll explore a real-life use case and build an entire ETL pipeline from scratch using the shell scripting language Bash. After that, we’ll get an overview of data warehouses, which are an integral part of most ETL pipelines and are the central storage location for most analytics-related data.

Then, we’ll discuss some common examples and use cases of real-life ETL pipelines and look at how companies constantly use them to meet their analytical needs in the modern data jungle.

In the following three sections of the course, we’ll get hands-on experience with each major step in the ETL pipeline: extract, transform, and load. In each section, we’ll go over some common ways of doing the steps, combined with best practices and interactive activities filled with code snippets to assist with the journey to becoming proficient at transferring data.

Extract

This section covers how to extract data from various sources such as web scraping, REST API, multiple databases, and cloud storage solutions. We’ll become experts at extracting data from common data sources such as files, APIs, databases (both on-prem and cloud-based), and more.

Press + to interact
Extracting data from different data sources
Extracting data from different data sources

Transform

After collecting data, we’ll learn how to process and transform it using tools such as SQL, Python, Apache Spark, and Bash. We’ll learn how to clean data, verify data integrity, eliminate duplicates, normalize the data, and add business context to otherwise generic and raw data.

Press + to interact
Processing and transforming data
Processing and transforming data

Load

Next, we’ll learn how to load the data to various analytical environments such as relational and non-relational databases, on-prem data warehouses, or cloud solutions like Google Cloud’s BigQuery.

Press + to interact
Load to destination system
Load to destination system

Combining everything

Finally, we’ll combine the steps and create a complete ETL pipeline, and we’ll learn how to monitor, automate, and orchestrate the ETL pipeline using Apache Airflow.

Press + to interact
ETL pipeline
ETL pipeline

These projects will provide valuable practice for building ETL pipelines and demonstrate what real ETL pipelines might look like in the real world.

Press + to interact
Course roadmap
Course roadmap

With that in mind, let’s begin the journey of becoming proficient in creating ETL pipelines.