Course Overview

Get a brief introduction to the course, its intended audience, and the key learning outcomes.

Welcome to this course on Apache Kafka!

Overview of Apache Kafka

Apache Kafka is a distributed streaming platform designed to handle real-time data streaming and processing in a distributed and fault-tolerant manner. It was originally developed by LinkedIn and later (in 2011) became an open-source project under the Apache Software Foundation. It functions as a message broker that enables clients to publish and read streams of data. Using Kafka, we can capture streams of events in real time from a variety of sources such as databases, IoT devices, etc. These data streams can then be stored, processed, and securely integrated with other parts of our system in a scalable and reliable manner.

Press + to interact

Kafka is applicable to a wide range of use cases, including:

  • Real-time data streaming: Kafka is designed to handle large volumes of data in real time, making it ideal for applications requiring fast and efficient data stream processing.

  • Event-driven architectures: With Kafka, events can be processed in real time and trigger actions or workflows in other applications. This can be particularly useful in applications such as fraud detection, where real-time processing of events is critical to detecting and preventing fraud.

  • Microservices: Since microservices comprise smaller, independent services, Kafka can be used as a messaging layer between these components. This allows for a scalable and loosely coupled architecture.

  • Log aggregation: involves collecting and analyzing log data from various sources, including system logs, application logs, and web server logs. By centralizing log data in Kafka, organizations can easily analyze and monitor log data in real time, making it easier to identify and troubleshoot issues.

Target audience

This course adopts a hands-on approach to learning Kafka. Along with core Kafka fundamentals, it will also cover the ecosystem of projects (Kafka Streams, Kafka Connect, etc.) whose knowledge is critical to building end-to-end solutions.

This is a course for software developers, data engineers, and other data professionals who want to learn Kafka to build data-intensive applications. It will prove helpful for anyone who wants to learn Kafka with a practical, hands-on approach using first-class programming languages like Java, instead of being limited to a CLI.

Prerequisites

Some programming experience with Java is preferable because we will use Java libraries to interact with Kafka.

Course contents

This course covers the following topics that will help build a solid foundation for Kafka:

  • An overview of the Kafka architecture, client libraries, and its ecosystem of projects

  • Hands-on examples of using Kafka Client APIs (Producer, Consumer, Admin), along with key configurations

  • Developing stream processing applications using Kafka Streams (with the Processor and DSL APIs), querying their state using Interactive Queries, and how to test them

  • Using Kafka Connect source and sink connectors to build scalable data pipelines

  • Diving into key Kafka-related projects in addition to the core ecosystem, including the Spring Framework and Schema Registry

  • Best practices for each covered topic

Course structure and demo applications

Each lesson consists of exercises in the form of quizzes and coding challenges to reinforce concepts. The course also has a project assignment to apply the skills you have learned.

Some of the practical demonstrations covered in this course include:

  • Using the Kafka Producer, Consumer, and Admin APIs

  • Using the Kafka Streams DSL and Processor APIs to process real-time data flowing through Kafka topics

  • Using Kafka Connect source and sink connectors to build data pipelines to connect heterogeneous systems

  • Testing Kafka Streams applications

  • Using Kafka with the Spring Framework