Zookeeper Atomic Broadcast
Learn about the consensus protocol Zookeeper Atomic Broadcast.
Zookeeper
Before we dive into the Zookeeper Atomic Broadcast Protocol (ZAB), let’s take a look at some of the core constructs of Zookeeper.
Zookeeper is a distributed system used to implement coordination services for applications. Some of the examples of coordination services include leader election and config management. It provides a file system-based hierarchical key-value store storing data in a tree-like structure.
znodes
znodes provide the primary abstraction in Zookeeper to represent a node in the key-value store. A znode stores data in a byte array and can have child nodes. Each znode also includes an additional data structure called a stat, which consists of the following metadata:
A transaction identifier (zxid) that created or modified the znode.
A transaction identifier (zxid) that created or modified its children.
Timestamp and version of the data change and child node change.
The owner of the znode.
There are three types of znodes:
Persistent znodes: These znodes remain permanent even after the client that created them dies.
Ephemeral znodes: These znodes remain transient and remain valid only during the active session of the client. When the client dies, Zookeeper automatically deletes ephemeral znodes.
Sequential znodes: Zookeeper automatically appends a sequential integer to the path that helps determine the path's order. Sequential znodes can be persistent or ephemeral.
Client and server
Apache Zookeeper has two components:
Client
Server
Clients are nodes that use the coordination service, and servers provide the coordination service. A set of servers forms a cluster called an ensemble.
When Zookeeper runs as an ensemble, it elects one of the servers as the leader, and the other servers are followers.
The leader is responsible for processing all modification requests and establishing the order of operations for their followers. The followers vote on the operations to commit the transaction.
Any of the servers in the ensemble can process read requests.
Leaders and followers are called participants since they participate in the leader election process. There is a third type of server called the observer, which does not participate in the voting process but acts as a passive learner of the changes applied to leaders and followers.
ZAB is the protocol used in Zookeeper to guarantee consensus on write operations and electing new leaders.
Quorum
Quorum is the minimum number of servers in the ensemble that should be alive for Zookeeper to respond to read and write requests. If N
is the total number of servers in the ensemble, quorum nodes should be at least N/2
. For example:
If
N
is5
, quorum nodes is5/2 = 2
, the database can tolerate a loss of3
nodes.If
N
is4
, quorum nodes is4/2 = 2
, the database can tolerate a loss of2
nodes.
Note: It is expected to have an odd number of servers for the quorum for high availability.
Zookeeper atomic broadcast
ZAB is the protocol used in Zookeeper for write operations and electing new leaders. It is similar to a two-phase commit.
Write operation
The client forwards the write operation to the leader of the ensemble. First, the leader converts the request into a transaction, indicating the sequence of operations that should be applied to the participant nodes and generating a transaction identifier. Write operations in Zookeeper are linearizable.
Once the transaction is generated, the leader applies the changes locally to its storage engine. Then, it uses the ZAB protocol to apply these changes to the follower nodes so that writes are persisted on a quorum of nodes:
The leader sends a
PROPOSAL
request that includes the transaction along withzxid
generated by the leader to be applied to follower nodes.Once the follower receives the
PROPOSAL
from the leader, they apply the transaction to their local storage engine and send anACK
message to the leader.Once the leader receives an
ACK
message from a quorum of servers, it sends aCOMMIT
request to followers withzxid
. Once theCOMMIT
is successful, writes are durable and made visible to clients.
Get hands-on with 1300+ tech skills courses.