type

project_id

private_key_id

private_key

client_email

client_id

auth_uri

token_uri

auth_provider_x509_cert_url

client_x509_cert_url

Master ETL processes, explore data extraction from MySQL, PostgreSQL, and MongoDB, and learn about scheduling and automating ETL pipelines using tools like Apache Airflow and Python.

testing.tar.gz

pymysql

Flask

mysql_fullExtract

binlog1 - connect_to_binglog

binlog2 - process_events

binlog3 - export_to_csv

gcloud

spark_MCAR

spark_MAR

spark_MNAR

spark_MNAR2

spark_MNAR3

spark_MNAR4

spark_duplicates1

spark_duplicates2

spark_duplicates3

spark_duplicates4

spark_duplicates5

spark_duplicates6

spark_duplicates7

spark_duplicates8

flight_data1

flight_data2

flight_data3

flight_data4

flight_data5

database_exercise

database_exercise1

spark_cleanData1

spark_cleanData2

spark_cleanData3

airflow_standalone

airflow_standalone-copy

airflow_standalone1

pipeline1_extract1

pipeline1_extract1-copy

pipeline1_extract

pipeline1_extract_data-copy

pipeline1_transform1

pipeline1_transform2

pipeline1_transform3

airflowtesting

ETL stands for extract, transform, and load. It’s a collection of processes that combine data from various sources and load them into data warehouses or other data repositories. ETL is crucial for providing data used for business intelligence and analytics.

In this course, you’ll experiment with extracting data from various database solutions such as MySQL, PostgreSQL, and MongoDB. You’ll use query and scripting languages like SQL, Python, and Apache Spark to process data and load it to data repositories or cloud solutions like Google’s GCP. Finally, you’ll learn how to schedule your ETL pipelines using cronjobs or automate and monitor them using open-source tools like Apache Airflow and Python’s pandas library.

After completing this course, you’ll have a strong grasp of various methods, tools, and techniques for transferring data from a source to its destination using ETL pipelines.

Transferring Data with ETL

Learn several methods for collecting and removing duplicate data.

Data Cleaning Using Apache Spark - Duplicate Data

Data Cleaning Using Apache Spark: Duplicate Data

Introduction

E: Extract

T: Transform

L: Load

Orchestration

ETL Pipeline: Fraud Detection Preprocessing

Conclusion

Build a News ETL Data Pipeline Using Python and SQLite

Data Cleaning Using Apache Spark: Duplicate Data

Handling duplicate data