Kafka Streams Stateful Operations
Learn about using stateful operations in Kafka Streams.
We'll cover the following
Stateful operations, also referred to as aggregations, are used to combine multiple input values into a single output value. Kafka Streams support the following stateful operators:
reduce
: This combines multiple input values of typeA
into a single outputA
.aggregate
: This combines multiple input values of typeA
into a single outputB
.count
: This counts the number of events by key.
Because aggregations involve combining multiple values associated with the same key, we have to group them before applying the aggregation operator.
Grouping
Grouping may be applied both to KStream
and KTable
. There are two operators used for grouping:
groupByKey
: This groups records by their current key while preserving their original values.groupBy
: This groups records on a new key, which is useful if the records do not have a key or changing the key is required. It is important to understand that this operation marks the stream for repartitioning. This means that if there are downstream operators that read the new key, a repartition topic will be created. All the records of this stream will be redistributed through the repartitioning topic by writing all records to it and rereading all records from it, which has higher performance costs than usinggroupByKey
.
Both operators filter out records for which the resulting key is null
.
In the code below, we can see a very simple topology using the groupByKey
operator:
Get hands-on with 1400+ tech skills courses.