Transactions, Storage Layout, and other Guarantees
Let’s have an overview of the transactions and the physical storage of Kafka, and the provided guarantees by it.
We'll cover the following
Transactional client
Kafka provides a transactional client that allows producers to produce messages to multiple partitions of a topic atomically.
A transactional client also makes it possible to commit consumer offsets from a source topic in Kafka and produces messages to a destination topic in Kafka atomically. This makes it possible to provide exactly-once guarantees for an end-to-end pipeline. This is achieved through the use of a two-phase commit protocol, where the brokers of the cluster play the role of the transaction coordinator in a highly available manner using the same underlying mechanisms for partitioning, leader election, and fault-tolerant replication.
The coordinator stores the status of a transaction in a separate log. The messages contained in a transaction are stored in their own partitions as usual.
When a transaction is committed, the coordinator is responsible for writing a commit marker to the partitions containing messages of the transactions and the partitions storing the consumer offsets.
Consumers can also specify the isolation level they want to read under, read_committed or read_uncommitted. In the former case, messages that are part of a transaction will be readable from a partition only after a commit marker has been produced for the associated transaction. This interaction is summarised in the following illustration:
Get hands-on with 1400+ tech skills courses.