Introduction to Kafka Streams
Learn about the core concepts of Kafka Streams and architectural considerations like scalability and fault tolerance.
What is Kafka Streams?
Kafka Streams is a Java library for building real-time, scalable, and fault-tolerant streaming applications that process data in motion. It allows developers to build complex stream processing applications using simple and concise Java code, leveraging the power of Kafka’s distributed architecture.
Some of its key benefits include the following:
It’s only a library: Kafka Streams is a Java library, not a platform. We can treat it as any other Java dependency and include it in new and/or existing applications. Another useful outcome of Kafka Streams only being a library is that it makes it easy to deploy and scale Kafka Streams applications. We can continue to use our existing deployment models or choose from many options, including on-premises, Cloud, Docker containers, Kubernetes, etc.
Tight integration with Apache Kafka: Kafka Streams uses Apache Kafka as its underlying storage and messaging system, which means it inherits many of the benefits of Kafka, such as scalability, fault tolerance, and high availability. As such, Kafka is the only dependency for Kafka Streams applications, making it easy to deploy and scale applications in practice.
Stateless and stateful processing: While stateless operations are common, Kafka Streams also supports stateful computations on streaming data. This is made possible by a combination of state stores and Interactive Queries (more on these concepts later in this lesson).
The Kafka Streams APIs
Kafka Streams provides two types of APIs for building stream processing applications:
The DSL (domain-specific language) API ...