The DSL API’s Stateless Operations: Overview
Learn about the DSL API in Kafka Streams and the stateless operations it supports.
Kafka Streams DSL API
Kafka Streams provides a domain-specific language (DSL) API that’s designed to build complex stream processing applications in a simple and concise way. This is because the DSL API is based on functional programming principles, which makes it highly composable and modular. It supports a wide range of stream processing operations, including:
Stateless transformations (e.g., map and filter)
Stateful operations (e.g., aggregations)
Join and windowing operations
Let’s quickly review these operations.
Kafka Streams DSL API operations
Stateless transformations
Stateless operations in Kafka Streams are a set of operations that are designed to operate independently on each data record in a stream. These operations do not require any internal state to be maintained across data records. Therefore, they are ideal for simple transformations or filtering operations. Stateless operations in Kafka Streams can be used to perform a wide range of data transformations such as filtering, mapping, flattening, and more. In addition, stateless operations can be easily combined with other stateful operations in Kafka Streams to create complex processing pipelines that can handle a wide range of data processing tasks.
Stateful operations
Stateful operations in Kafka Streams are a set of operations that require the internal state to be maintained across multiple data records. These operations are typically used for more complex transformations or aggregations that require data to be processed over time, rather than on a record-by-record basis. Stateful operations are important because they allow developers to perform complex computations on data streams that require context and historical information. For example, an aggregation operation that computes the average temperature over the last hour requires historical information about the temperature readings from the previous hour.
Join and windowing operations
Join operations in Kafka Streams allow developers to combine data from multiple streams based on a common key. They are commonly used in stream processing applications to perform data enrichment, where data from multiple streams is combined to produce a more complete or useful dataset. A popular pattern is to make the information in the databases available in Kafka using the Kafka Connect API (covered later). Then, implement applications that leverage the Kafka Streams API to perform fast and efficient local joins of such tables and streams, rather than requiring the application to make a query to a remote database over the network for each record.
Windowing operations in Kafka Streams allow developers to group data records into time-based windows for aggregation and analysis. It lets us control how to group records that have the same key for stateful operations, such as aggregations or joins, into windows that are tracked per record key. Supported window types in Kafka Streams include hopping, time window, tumbling time window, sliding time window, and session time windows.
The DSL API’s key interfaces
These interfaces are part of the org.apache.kafka.streams.kstream
package.
The KStream API
The KStream
API represents a stream of records in a Kafka topic. It is a continuous, unbounded sequence of data that is processed in real time. The KStream
API provides a functional programming interface for processing the data stream, allowing developers to apply various transformations and operations to the data.
The KTable API
The KTable
API represents a changelog of updates to a stream of data. It is a distributed, fault-tolerant, and scalable representation of a table that is continuously updated in real time. The KTable
API provides a relational database-like interface for querying and processing data, allowing developers to perform a wide range of join and aggregation operations on the stream. The KTable
API is designed for stateful operations requiring internal state management, such as real-time database updates or event-driven microservices.
The GlobalKTable API
In contrast to the KTable
API that is partitioned over all KafkaStreams
instances, the GlobalKTable
API is fully replicated per KafkaStreams
instance. Every partition of the underlying topic is consumed by each GlobalKTable
, such that the full set of data is available in every KafkaStreams
instance. This allows performing joins with the KStream
API without repartitioning the input stream.
Now that we have an overview of the DSL API in Kafka Streams, let’s cover stateless operations (supported by the KStream
API) in detail.
The KStream stateless operations
Let’s explore some of the important stateless operations supported by Kafka Streams.
Common operations
filter
: This operation allows us to apply a boolean condition to each record in a stream and only pass through the records that satisfy the condition. This operation does not modify the keys or values of the records, but simply removes records that do not meet the specified condition. Thefilter
method takes aPredicate
as its argument, representing the boolean condition to be applied to each record.
Get hands-on with 1300+ tech skills courses.