Stream Ingestion
Learn three different message delivery semantics and error handling in stream ingestion with a real-life example using BigQuery.
We'll cover the following...
As opposed to batch ingestion, stream ingestion handles real-time events. Data is consumed and loaded as soon as it's created from the source. Streaming solutions are booming in the market because more companies want real-time insights. For example, an online retail company wants to provide a personalized user experience by using users' online activities. As users browse the website, we can ingest real-time activities into a streaming framework and show useful and relevant recommendations to users during their online shopping.
However, ingesting streaming data can be pretty challenging. There is no staging area as in the batch solution. Data is brought to the consumer instantly, so it’s hard to ensure consistency. After the source generates an event, it may never get to the destination due to network issues, causing data loss. There can also be a scenario where an event is retried multiple times, causing duplication. Let’s understand some fundamental message delivery semantics to design a solid stream ingestion layer.
Message delivery semantics
Streaming platforms like Apache Kafka and Google Pub/Sub support three message delivery semantics: at-most-once, at-least-once, and ...