...

/

Amazon Managed Streaming for Apache Kafka

Amazon Managed Streaming for Apache Kafka

Learn about the capabilities of Amazon MSK.

Apache Kafka is an open-source platform for ingesting and processing streaming data. It was released to the public in 2011 after being developed at LinkedIn. The software is named after author Franz Kafka, a favorite of one of the developers who thought the Kafka software is also “a system optimized for writing.”

Press + to interact
Overview diagram of Amazon MSK
Overview diagram of Amazon MSK

Amazon MSK allows developers to use Kafka on AWS. It was launched in 2019 and provides additional functionality around managing and configuring servers for Kafka-based applications. For example, MSK attempts to detect and automatically recover from common failure scenarios for Kafka clusters so that related applications can continue operating.

Why Apache Kafka

The challenge that led to the development of Apache Kafka was how to quickly ingest huge amounts of event data in real time. This data could then be used in ML and other algorithms. Around 2011, LinkedIn was using Kafka to ingest more than 1 billion events per day. More recently, the ingestion rates have increased to over 1 trillion events per day. Companies such as Uber and Netflix also use Kafka to process huge amounts of real-time data for their specific use cases.

The need for Kafka arose because algorithms such as those related to ML can require large amounts of real-time data. Before Kafka, other solutions couldn’t move billions of events from one place to another quickly enough for the algorithms to work effectively.

Kafka terminology

Apache Kafka is a distributed system that includes a set of servers within a Kafka cluster. A degree of ...