...

Getting Started

Learn about the general prerequisites of this course and what it offers to its learners.

We'll cover the following...

Who should take this course?
Prerequisites
Course structure

In this course, we’ll get hands-on experience building ETL pipelines for transferring data for business intelligence (BI) and analytics.

Who should take this course?

ETL is crucial for BI and analytics. Today's organizations and companies are data-driven. They place a high value on making strategic decisions based on data analysis and overall BI.

The ability to build ETL pipelines to transfer data is valuable to organizations. It provides the organization with a supply of clean data that can be analyzed and ultimately offers better customer and consumer service. A typical ETL process combines many tools, languages, and techniques. It's a whole world and an excellent way of practicing building systems that combine multiple components.

Anyone who aspires to be one or is already an active data engineer, data scientist, data analyst, or developer or is interested in creating data pipelines for serving data warehouses and data analytics platforms using the ETL paradigm can benefit from this course.

Prerequisites

Throughout this course, we'll use various tools and languages to complete the steps in the ETL process. These might include:

Shell scripting languages like Bash
SQL database solutions such as MySQL or PostgreSQL
NoSQL database solutions like MongoDB
Programming languages like Python and Apache Spark
Data orchestration tools like Apache Airflow

While familiarity with these tools is recommended, please feel free to use any tools to complete the tasks. Actively engaging in the work, modifying the code, or coding your solutions is very much encouraged. Remember, we’re problem-solvers first and foremost. The tool itself doesn't matter; what matters is to get the job done in the best possible way and in the least amount of time.

Course structure

This course is divided into five sections, and each section focuses on a particular topic. Here’s a brief overview of what to expect in each section.

Introduction

We’ll begin by briefly introducing each stage in the ETL process. We’ll then define ETL pipelines and data pipelines in general and explore the fundamental principles and techniques behind building ETL pipelines. We’ll learn about the tools, techniques, and everyday use cases for the two main ETL paradigms: batch and streaming pipelines.

Next, we’ll explore a real-life use case and build an entire ETL pipeline from scratch using the shell scripting language Bash. After that, we’ll get an overview of data warehouses, which are an integral part of most ETL pipelines and are the central storage location for most analytics-related data.

Then, we’ll discuss some common examples and use cases of real-life ETL pipelines and look at how companies constantly use them to meet their analytical needs in the modern data jungle.

In the following three sections of the course, we’ll get hands-on experience with each major step in the ETL pipeline: extract, transform, and load. In each section, we’ll go over some common ways of doing the steps, combined with best practices and interactive activities filled with code snippets to assist with the journey to becoming proficient at transferring data.

Extract

This section covers how to extract data from various sources such as web scraping, REST API, multiple databases, and cloud storage solutions. We’ll become experts at extracting data from common data sources such as files, APIs, databases (both on-prem and cloud-based), and more.

Press + to interact

Introduction

E: Extract

T: Transform

L: Load

Orchestration

ETL Pipeline: Fraud Detection Preprocessing

Conclusion

Build a News ETL Data Pipeline Using Python and SQLite

Getting Started

Who should take this course?

Prerequisites

Course structure

Introduction

Extract

Transform

Load

Combining everything