- Apache Kafka

Introduction to Kafka and its producer and consumer APIs.

Kafka

Kafka is an open-source streaming platform that was incubated at LinkedIn. It is designed to handle real-time data streams that have high throughput and low latency. It is written in Java and Scala, but it supports a range of programming languages for producing and consuming streams through standardized APIs.

The platform can scale to large datasets by using horizontal scaling and partitioning to distribute workloads across a cluster of servers called brokers. While open-source Kafka is a hosted solution for message streaming, some cloud providers now offer fully-managed versions of Kafka, such as Amazon’s MSK offering.

To show how Kafka can be integrated into a streaming workflow, we’ll use a single-node setup to get up and running. For a production environment, you’ll want to set up a multi-node cluster for redundancy and improved latency.

Since the focus of this chapter is model application, we won’t dig into the details of setting up Kafka for high-availability, and instead, I recommend ...

Access this course and 1400+ top-rated courses and projects.