Evaluation of Kafka
Let's recap how Kafka fulfills its promised functionalities.
Kafka promised to be efficient in collecting data from multiple producers in parallel, retaining data, and delivering it to multiple consumers simultaneously. Moreover, it promised to deliver loads of data in real time. Let's go through some pieces of evidence as to how Kafka provides these functionalities by comparing the performance of Kafka with Apache ActiveMQ (a popular open-source implementation of Java Message Service (JMS)) and RabbitMQ (a messaging system known for its performance).
All the
Performance improvements
To check the improved performance of Kafka, we’ll have to analyze the messages going from producer to brokers and from brokers to consumers.
Producer throughput
ActiveMQ and RabbitMQ don't have any simple way to send batched messages, so only 1 message is sent to the broker at any given time. However, if we use a single producer at each system to produce 10 million messages, each message being 200 bytes in size, and send these messages in batches of 1 to 50, Kafka can publish 50,000 to 400,000 messages per second, respectively. The results achieved by Kafka are orders of magnitude better than ActiveMQ’s results and twice as better as RabbitMQ’s results.
The reasons why Kafka's producer shows this improved performance are listed as follows:
Kafka's producer sends as many messages to the broker as the broker can process without waiting for any kind of acknowledgment from it.
Kafka possesses a simple and efficient storage system. On average, Kafka only had an overhead of 9 bytes per message as opposed to ActiveMQ's 144 bytes. ActiveMQ's overhead comes from two sources:
A large message header that JMS requires
Maintenance of indexing structures
Batch processing
Kafka’s batching is the key to its achieved improvement in performance because sending a batch of messages also reduces the remote procedure call (RPC) overhead. Moreover, if the systems are far away from each other, batching will be able to make maximum use of the RTT. The improved throughput of Kafka’s producer as compared to ActiveMQ and RabbitMQ and the magnitude of improvement in batch processing adds to its performance. This can be seen in the following illustration.
Level up your interview prep. Join Educative to access 80+ hands-on prep courses.