Push vs. Pull
Understand the difference between push-based ingestion and pull-based ingestion.
When looking at ingestion from a network communication perspective, there are two main strategies: pull and push. A push strategy involves a source system sending data to a target, while a pull strategy involves a target reading data directly from a source. We will examine the differences between these two strategies and learn about each's pros and cons.
Push
In push-based ingestion, data is pushed from the source to the target as soon as it becomes available. The source can be an active source generating a huge amount of data, like an IoT device, or a less active source, like a chat application.
Advantages
- Real time: Whenever the source receives new data, it immediately pushes it to the destination, and no request is needed. Push ingestion is more efficient for sources that constantly produce data.
- Immutable data: With a push-based solution, data is always immutable, which is suitable for auditing purposes.
Security: Source systems are more secure in push-based solutions because they don't listen for network connection. All the requests are authenticated on the consumer side, so there is less chance for the source system to get attacked.
Disadvantages
However, push-based ingestion has a few downsides:
-
Replayability: The source system will only publish each message once. If the consumer misses some messages, it’s hard to get them back.
-
Difficult flow control: In a push system, the source or the intermediate engine controls the flow rate. Consumers might be overwhelmed if the consumption rate falls far behind the production rate. It’s also tricky for producers to fine-tune every consumer’s flow rate.
-
Passive consumer: The fact that consumers are not able to control how they receive data introduces other inconveniences like not being able to define batch size. The producer lacks the ...