Real-Time Streaming Platforms

Learn about some of the common real-time streaming platforms.

Apache Kafka

Apache Kafka is one of the most famous streaming platforms. It was created by LinkedIn initially and, after a few years, was donated to the Apache Foundation.

Kafka is a distributed streaming platform. We can use it as a real-time messaging system with a high fault tolerance capability. In other words, it means that our messages will be delivered fast, and if it fails, we can figure it out quickly. A message can be anything from a string, to a serialized object, to a blob.

Kafka is also very famous for its scalability powers. Despite its numerous capabilities, Kafka is widely used to connect heterogeneous applications in a many-to-many manner.

For example, when the system receives a new stock price, it shows in the UI, updates the database with the new value, and stores the event of a change to HDFS or any data lake we use. We can do this by having multiple consumers, also referred to as connectors, read the message and act upon it.

Kafka has various subcomponents, like Kafka Streams and KafkaSQL, each of which has its own specialization.

Apache Flume

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of data. Usually, this data refers to log files. For example, imagine running a web app using multiple servers, and we want to get the logs in one place. This would be a perfect use case for Apache Flume.

It has a simple and flexible architecture based on streaming data flows. Flume is robust and fault-tolerant, with tunable reliability mechanisms, and many failovers and recovery mechanisms.

Spark Streaming

It is part of the core Spark API and fulfills the philosophy of being the “one and only tool for your big data processing.” Spark streaming provides streaming capabilities in a high-throughput, fault-tolerant way.

It accepts input from many different source types like Kafka or HDFS and pushes the data to a persistence layer in a streaming way.

Apache Storm

Apache Storm was initially created on Twitter to perform real-time message processing and might look similar to Kafka. However, unlike Kafka, it focuses on computation rather than on delivery.

Kafka is mostly a queue, whereas Storm can do multiple types of computations over data.

Alternatives

These tools have a steep learning curve. Therefore, they’re not the best fit for a one-off gig or small projects.

Here are some tools that require less reading and can achieve results of a similar nature:

  • The PUB-SUB functionality of the Redis database.
  • RabbitMQ.
  • WebSockets, to trigger some processing as a result of a new data event.

Get hands-on with 1400+ tech skills courses.