Gain insights into Apache Kafka's role in scalable data pipelines. Explore its theory and practice interactive commands to build efficient and diverse data transmission solutions.

kafka.tar.gz

If you’re interested in Big Data, then Apache Kafka is a must-know tool. 

What started as an internal LinkedIn project to streamline data transmission and propagation among services has quickly grown to become a mainstay platform for building highly scalable data pipelines. Meet Apache Kafka - the ubiquitous tool to build pipelines for diverse use cases ranging from chronologically tracking user-activity on a website to implementing publish-subscribe feeds. 

This course introduces you to Kafka theory and provides you with a hands-on interactive browser-terminal to execute Kafka commands against a running Kafka broker.

Building Scalable Data Pipelines with Kafka

>If you are unfamiliar with Zookeeper, we would suggest you read the Zookeeper chapter in the appendix first and then come back to this lesson.

## Broker membership
So far, we have largely skipped the internal working details of a Kafka cluster and its interactions with consumers and producers. In this chapter, we'll take a closer look at how the different Kafka components work. We'll start with the Kafka cluster that consists of several brokers working together. Brokers maintain their membership in a cluster via a unique ID that is set either in the configuration file or automatically generated. Each broker creates an ephemeral node in Zookeeper with its ID under the Zookeeper path `/brokers/id`.  Various Kafka components receive notifications when brokers join or leave the cluster by keeping a watch on the path `/brokers/id` where brokers create ephemeral nodes. A new broker can't register itself with the same ID as an existing broker. A broker can lose connectivity to Zookeeper for a variety of reasons such as:

- broker deliberately stopping
- garbage collector pause
- network partition

If such a situation occurs, the ephemeral node created by the broker at the time it started is automatically removed from Zookeeper. Kafka components watching the list of brokers are notified that the broker has left. Interestingly, if a brand new broker is spun up with the same ID as the broker that left, the new broker will be assigned the same partitions and topic as the broker that left. This is because even though the broker left the cluster, its ID is still retained within internal data structures. If a new broker comes along with the same ID, those data structures start referring to the new broker.

>If you are unfamiliar with Zookeeper, we would suggest you read the Zookeeper chapter in the appendix first and then come back to this lesson.

# Broker membership
So far, we have largely skipped the internal working details of a Kafka cluster and its interactions with consumers and producers. In this chapter, we'll take a closer look at how the different Kafka components work. We'll start with the Kafka cluster that consists of several brokers working together. Brokers maintain their membership in a cluster via a unique ID that is set either in the configuration file or automatically generated. Each broker creates an ephemeral node in Zookeeper with its ID under the Zookeeper path `/brokers/id`.  Various Kafka components receive notifications when brokers join or leave the cluster by keeping a watch on the path `/brokers/id` where brokers create ephemeral nodes. A new broker can't register itself with the same ID as an existing broker. A broker can lose connectivity to Zookeeper for a variety of reasons such as:

- broker deliberately stopping
- garbage collector pause
- network partition

If such a situation occurs, the ephemeral node created by the broker at the time it started is automatically removed from Zookeeper. Kafka components watching the list of brokers are notified that the broker has left. Interestingly, if a brand new broker is spun up with the same ID as the broker that left, the new broker will be assigned the same partitions and topic as the broker that left. This is because even though the broker left the cluster, its ID is still retained within internal data structures. If a new broker comes along with the same ID, those data structures start referring to the new broker.

This lesson examines how Kafka brokers become members of a cluster and the role of a controller within a cluster.

Basics

Kafka Producer

Kafka Consumer

Kafka Internals

Conclusion

Appendix

Reference: Replication

Reference: Partitioning

Reference: Transactions

Reference: Issues in Distributed Systems

Kafka Controller

Broker membership