Zookeeper Atomic Broadcast

Learn about the consensus protocol Zookeeper Atomic Broadcast.

Zookeeper

Before we dive into the Zookeeper Atomic Broadcast Protocol (ZAB), let’s take a look at some of the core constructs of Zookeeper.

Zookeeper is a distributed system used to implement coordination services for applications. Some of the examples of coordination services include leader election and config management. It provides a file system-based hierarchical key-value store storing data in a tree-like structure.

znodes

znodes provide the primary abstraction in Zookeeper to represent a node in the key-value store. A znode stores data in a byte array and can have child nodes. Each znode also includes an additional data structure called a stat, which consists of the following metadata:

  • A transaction identifier (zxid) that created or modified the znode.

  • A transaction identifier (zxid) that created or modified its children.

  • Timestamp and version of the data change and child node change.

  • The owner of the znode.

There are three types of znodes:

  • Persistent znodes: These znodes remain permanent even after the client that created them dies.

  • Ephemeral znodes: These znodes remain transient and remain valid only during the active session of the client. When the client dies, Zookeeper automatically deletes ephemeral znodes.

  • Sequential znodes: Zookeeper automatically appends a sequential integer to the path that helps determine the path's order. Sequential znodes can be persistent or ephemeral.

Client and server

Apache Zookeeper has two components:

  • Client

  • Server

Clients are nodes that use the coordination service, and servers provide the coordination service. A set of servers forms a cluster called an ensemble.

When Zookeeper runs as an ensemble, it elects one of the servers as the leader, and the other servers are followers.

  • The leader is responsible for processing all modification requests and establishing the order of operations for their followers. The followers vote on the operations to commit the transaction.

  • Any of the servers in the ensemble can process read requests.

Leaders and followers are called participants since they participate in the leader election process. There is a third type of server called the observer, which does not participate in the voting process but acts as a passive learner of the changes applied to leaders and followers.

ZAB is the protocol used in Zookeeper to guarantee consensus on write operations and electing new leaders.

Quorum

Quorum is the minimum number of servers in the ensemble that should be alive for Zookeeper to respond to read and write requests. If N is the total number of servers in the ensemble, quorum nodes should be at least N/2 . For example:

  • If N is 5, quorum nodes is 5/2 = 2 , the database can tolerate a loss of 3 nodes.

  • If N is 4, quorum nodes is 4/2 = 2 , the database can tolerate a loss of 2 nodes.

Note: It is expected to have an odd number of servers for the quorum for high availability.

Zookeeper atomic broadcast

ZAB is the protocol used in Zookeeper for write operations and electing new leaders. It is similar to a two-phase commit.

Write operation

The client forwards the write operation to the leader of the ensemble. First, the leader converts the request into a transaction, indicating the sequence of operations that should be applied to the participant nodes and generating a transaction identifier. Write operations in Zookeeper are linearizable.

Once the transaction is generated, the leader applies the changes locally to its storage engine. Then, it uses the ZAB protocol to apply these changes to the follower nodes so that writes are persisted on a quorum of nodes:

  • The leader sends a PROPOSAL request that includes the transaction along with zxid generated by the leader to be applied to follower nodes.

  • Once the follower receives the PROPOSAL from the leader, they apply the transaction to their local storage engine and send an ACK message to the leader.

  • Once the leader receives an ACK message from a quorum of servers, it sends a COMMIT request to followers with zxid. Once the COMMIT is successful, writes are durable and made visible to clients.

Get hands-on with 1300+ tech skills courses.