Kafka Streams Stateful Operations

Learn about using stateful operations in Kafka Streams.

Stateful operations, also referred to as aggregations, are used to combine multiple input values into a single output value. Kafka Streams support the following stateful operators:

  • reduce: This combines multiple input values of type A into a single output A.

  • aggregate: This combines multiple input values of type A into a single output B.

  • count: This counts the number of events by key.

Because aggregations involve combining multiple values associated with the same key, we have to group them before applying the aggregation operator.

Grouping

Grouping may be applied both to KStream and KTable. There are two operators used for grouping:

  • groupByKey: This groups records by their current key while preserving their original values.

  • groupBy: This groups records on a new key, which is useful if the records do not have a key or changing the key is required. It is important to understand that this operation marks the stream for repartitioning. This means that if there are downstream operators that read the new key, a repartition topic will be created. All the records of this stream will be redistributed through the repartitioning topic by writing all records to it and rereading all records from it, which has higher performance costs than using groupByKey.

Both operators filter out records for which the resulting key is null.

In the code below, we can see a very simple topology using the groupByKey operator:

Get hands-on with 1400+ tech skills courses.