...

Push vs. Pull

Understand the difference between push-based ingestion and pull-based ingestion.

We'll cover the following...

Push
Pull
Comparison

When looking at ingestion from a network communication perspective, there are two main strategies: pull and push. A push strategy involves a source system sending data to a target, while a pull strategy involves a target reading data directly from a source. We will examine the differences between these two strategies and learn about each's pros and cons.

Push

In push-based ingestion, data is pushed from the source to the target as soon as it becomes available. The source can be an active source generating a huge amount of data, like an IoT device, or a less active source, like a chat application.

Press + to interact

Disadvantages of push ingestion

Replayability: The source system will only publish each message once. If the consumer misses some messages, it’s hard to get them back.
Difficult flow control: In a push system, the source or the intermediate engine controls the flow rate. Consumers might be overwhelmed if the consumption rate falls far behind the production rate. It’s also tricky for producers to fine-tune every consumer’s flow rate.
Passive consumer: The fact that consumers are not able to control how they receive data introduces other inconveniences like not being able to define batch size. The producer lacks the knowledge of whether data should be sent one-by-one or batched first.

Getting Started

Data Team Structure

Data Engineering Life Cycle

Cloud Data Architecture

Data Ingestion

Data Modeling

Data Orchestration

Mastering Airflow: Building an ETL Pipeline

Data Quality

Build an End-to-End Data Pipeline for Formula 1 Analysis

Epilogue

Appendix

Push vs. Pull

Push

Advantages

Disadvantages