Commits and Offsets
This lesson explains offsets that are committed by consumers to remember their position in the partition that they read.
We'll cover the following
We have read so far that Kafka consumers poll topic partitions for records using the poll()
method. The method returns some number of records for the consumer to process. However, if a rebalance occurs and the partition gets assigned to a different consumer within the consumer group, then the newly assigned consumer must know where the previous consumer stopped in order to resume reading records from that point on in the partition.
The offset identifies the position in a partition up to which a consumer has read. The act of durably storing or updating that position is called the commit. Unlike some other messaging systems, Kafka doesn’t track acknowledgments from the consumers of read records. Instead, the onus of tracking a consumer’s position within a partition is on the consumer itself. Each consumer commits its offset for every partition it is reading by writing a message to a special Kafka topic called __consumer_offsets.
Committed offset less than last record read
Consider a scenario where a consumer reads four messages at a time. It reads up to message 6 but the last commit offset is recorded as 4. If the consumer were to crash at this point and another consumer took up processing this partition, then the new consumer will start reading messages starting from the record numbered 5. Evidently, some of the records will end up being processed twice.
Get hands-on with 1400+ tech skills courses.