Batch Ingestion

Learn different batch ingestion patterns with a real-life example using BigQuery.

Data ingestion is the first stage in most data architecture designs. The process has two steps. First, it consumes data from assorted sources. Second, it loads data into centralized storage, which can be accessed and used by the organization. It is a critical component in the data engineering life cycle because downstream systems rely entirely on the ingestion layer's output.

Press + to interact
Two steps in data ingestion
Two steps in data ingestion

The ingestion layer works with various data sources, which data engineers typically don't have full control of. A good practice is building a layer of data quality checks and a self-healing system to react to unexpected situations, such as data loss, corruption, system failure, etc. Let’s explore a traditional but widely used design pattern, batch ingestion, with a real-life example using BigQuery.

Batch ingestion is a commonly used way to ingest data. It processes data in bulk, meaning that a subset of data from the source system is extracted and loaded into the internal data storage based on the time interval or the size of the accumulated data.

Time-based vs. size-based batch ingestion

Time-based batch ingestion often processes data on a fixed time interval (e.g., once a day) to provide periodic reporting. It is often used in traditional business ETL or ELT for data warehousing, such as getting daily transactions ...