Course Overview
Get a brief introduction to the course, its intended audience, and the key learning outcomes.
Welcome to this course on Apache Kafka!
Overview of Apache Kafka
Apache Kafka is a distributed streaming platform designed to handle real-time data streaming and processing in a distributed and fault-tolerant manner. It was originally developed by LinkedIn and later (in 2011) became an open-source project under the Apache Software Foundation. It functions as a message broker that enables clients to publish and read streams of data. Using Kafka, we can capture streams of events in real time from a variety of sources such as databases, IoT devices, etc. These data streams can then be stored, processed, and securely integrated with other parts of our system in a scalable and reliable manner.
Kafka is applicable to a wide range of use cases, including:
Real-time data streaming: Kafka is designed to handle large volumes of data in real time, making it ideal for applications requiring fast and efficient data stream processing.
Event-driven architectures: With Kafka, events can be processed in real time and trigger actions or workflows in other applications. This can be particularly useful in applications such as fraud detection, where real-time processing of events is critical to detecting and preventing fraud.
Microservices: Since microservices comprise smaller, independent services, Kafka can be used as a messaging layer between these components. This allows for a scalable and loosely coupled architecture.
Log aggregation: involves collecting and analyzing log data from various sources, including system logs, application logs, and web server logs. By centralizing log data in Kafka, organizations can easily analyze and monitor log data in real time, making it easier to identify and troubleshoot issues.
Target audience
This course adopts a hands-on approach to learning Kafka. Along with core Kafka fundamentals, it will also cover the ecosystem of projects (Kafka Streams, Kafka Connect, etc.) whose knowledge is critical to building end-to-end solutions.
This is a course for software developers, data engineers, and other data professionals who want to learn Kafka to build data-intensive applications. It will prove helpful for anyone who wants to learn Kafka with a practical, hands-on approach using first-class programming languages like Java, instead of being limited to a CLI.
Prerequisites
Some programming experience with Java is preferable because we will use Java libraries to interact with Kafka.
Course contents
This course covers the following topics that will help build a solid foundation for Kafka:
An overview of the Kafka architecture, client libraries, and its ecosystem of projects
Hands-on examples of using Kafka Client APIs (Producer, Consumer, Admin), along with key configurations
Developing stream processing applications using Kafka Streams (with the Processor and DSL APIs), querying their state using Interactive Queries, and how to test them
Using Kafka Connect source and sink connectors to build scalable data pipelines
Diving into key Kafka-related projects in addition to the core ecosystem, including the Spring Framework and Schema Registry
Best practices for each covered topic
Course structure and demo applications
Each lesson consists of exercises in the form of quizzes and coding challenges to reinforce concepts. The course also has a project assignment to apply the skills you have learned.
Some of the practical demonstrations covered in this course include:
Using the Kafka Producer, Consumer, and Admin APIs
Using the Kafka Streams DSL and Processor APIs to process real-time data flowing through Kafka topics
Using Kafka Connect source and sink connectors to build data pipelines to connect heterogeneous systems
Testing Kafka Streams applications
Using Kafka with the Spring Framework