Ingestion, Transformation, and Visualization
Learn about three key steps in the data engineering life cycle: ingestion, transformation, and visualization.
We'll cover the following...
The data engineering life cycle contains three important stages: ingestion, transformation, and visualization. Their execution order is not strictly mandated, and each stage can be executed independently.
Ingestion
Data ingestion is the process of importing data from one or more source systems into the storage layer. The following code from the Google Analytics example creates a BigQuery job to create a table from the raw CSV file:
# Ingestion - load raw data into BigQuerywith open("ga.csv", "rb") as source_file:job = client.load_table_from_file(source_file,src_table_id,job_config=job_config,)job.result()src_table = client.get_table(src_table_id)print("Loaded {} rows and {} columns to {}".format(src_table.num_rows, len(src_table.schema), src_table_id))
Ingestion is a critical and challenging step because source systems and their data quality are typically out of data engineers' direct control. Therefore, establishing good collaboration with the source table and implementing data quality checks is essential for ensuring smooth integration with the other systems.
Nowadays, numerous data ingestion tools can automatically gather data from diverse sources and seamlessly ingest it to the destination, which ...