Transactions, Storage Layout, and other Guarantees

Let’s have an overview of the transactions and the physical storage of Kafka, and the provided guarantees by it.

Transactional client

Kafka provides a transactional client that allows producers to produce messages to multiple partitions of a topic atomically.

A transactional client also makes it possible to commit consumer offsets from a source topic in Kafka and produces messages to a destination topic in Kafka atomically. This makes it possible to provide exactly-once guarantees for an end-to-end pipeline. This is achieved through the use of a two-phase commit protocol, where the brokers of the cluster play the role of the transaction coordinator in a highly available manner using the same underlying mechanisms for partitioning, leader election, and fault-tolerant replication.

The coordinator stores the status of a transaction in a separate log. The messages contained in a transaction are stored in their own partitions as usual.

When a transaction is committed, the coordinator is responsible for writing a commit marker to the partitions containing messages of the transactions and the partitions storing the consumer offsets.

Consumers can also specify the isolation level they want to read under, read_committed or read_uncommitted. In the former case, messages that are part of a transaction will be readable from a partition only after a commit marker has been produced for the associated transaction. This interaction is summarised in the following illustration:

Get hands-on with 1400+ tech skills courses.