...

/

Ingestion, Transformation, and Visualization

Ingestion, Transformation, and Visualization

Learn about three key steps in the data engineering life cycle: ingestion, transformation, and visualization.

The data engineering life cycle contains three important stages: ingestion, transformation, and visualization. Their execution order is not strictly mandated, and each stage can be executed independently.

Ingestion

Data ingestion is the process of importing data from one or more source systems into the storage layer. The following code from the Google Analytics example creates a BigQuery job to create a table from the raw CSV file:

Press + to interact
Please provide values for the following:
gcp_auth
Not Specified...
# Ingestion - load raw data into BigQuery
with open("ga.csv", "rb") as source_file:
job = client.load_table_from_file(
source_file,
src_table_id,
job_config=job_config,
)
job.result()
src_table = client.get_table(src_table_id)
print("Loaded {} rows and {} columns to {}".format(src_table.num_rows, len(src_table.schema), src_table_id))

Ingestion is a critical and challenging step because source systems and their data quality are typically out of data engineers' direct control. Therefore, establishing good collaboration with the source table and implementing data quality checks is essential for ensuring smooth integration with the other systems.

Nowadays, numerous data ingestion tools can automatically gather data from diverse sources and seamlessly ingest it to the destination, which ...