Efficiency of Kafka
Learn about the design decisions of Kafka that make it efficient.
We'll cover the following
Kafka has a few features that make it efficient, like simple storage, efficient transfer, and a stateless broker.
Simple storage
Kafka has a simple layout for storage. Key parts of its features are discussed below.
Implementation of a partition
A partition can be implemented as a large file. However, Kafka does not keep the data forever. It has to clean older data from the disk to make way for new data. If partitions are implemented as a large file, it is hard to find and clean data that is no longer needed. Therefore, a partition in a topic is implemented like a logical log that comprises a set of segment files approximately of the same size. This way, we can append new messages in a segment file by deleting messages from the oldest updated segment file without having to find and delete a part of a large file and then append data to it.
The broker appends each message to the last used segment file or an active segment whenever a producer publishes it to a partition. These segment files are flushed to the disk. However, to achieve better performance, the system waits till a segment file has gathered a certain amount of data or if a certain amount of time has passed before writing it to the disk, whichever happens first. A segment file usually contains either 1 GB or a week’s data. Consumers can only read a message after it has been flushed to the disk.
Level up your interview prep. Join Educative to access 70+ hands-on prep courses.