Primitives of ZooKeeper

Learn to implement some example primitives of ZooKeeper using the clientAPI.

ZooKeeper helps clients to implement a set of primitives using the client API. ZooKeeper servers do not know about these powerful primitives because there isn’t any function for each of them. For the servers, these are just a set of functions from the client API sent by the client. Some of the common primitives are wait-free, which are group membership and configuration management. These primitives are either updated once or very often, and are mostly used for reading which makes it wait-free. Others, like rendezvous and locks, are based on waiting unless an event, such as a watch, are not triggered.

Configuration management

ZooKeeper allows the client to implement dynamic configuration by using the following steps:

  1. Create a znode zcz_c with 8090 as the port value.
newZnode = create("/config/port", 8090, EPHEMERAL)
  1. Each process is called with the full path of its zcz_c and gets its configuration from zcz_c, whose watch flag is set to true, using the command below.
getData("/config/port", true)
  1. The configurations are not updated too often, but when they are, using the command below, the process is notified to use the updated configuration that sets the watch flag to true.
setData("/config/port", value, version)

Watches help ensure that the process which has set the watch flag to true gets updated information. Any process that has set the watch flag to true on the zcz_c will be notified so that it can use the updated value before it tries to read zcz_c. The process will be notified only once for multiple changes in the zcz_c before the process reads zcz_c. The aim is to notify the process that the information it has about zcz_c is stale and needs to get updated information from the server.

Rendezvous

Due to dynamic configuration in distributed systems, we mostly have no idea about the system’s final configuration. Let’s take the manager-worker architecture as an example where the client wishes to initiate multiple worker processes along with a manager process. The client uses a scheduler to schedule these manager and worker processes. The client itself doesn’t know when the manager process will be initiated. If a couple of worker processes are initiated before the manager, we cannot connect with the manager, since we don’t have the details (the IP address and port) of the manager yet. What should we do in such a scenario? ZooKeeper provides a solution to this problem. We call it the rendezvous znode, zrz_r.

Note: Though the original paper on ZooKeeper uses the term “master-worker”, we will use the term “manager-worker” to refer to the same thing.

Our solution consists of the following steps:

  1. Create a new rendezvous znode zrz_r, normally a regular znode. Since zrz_r is an ephemeral znode, it will not be deallocated as long as the manager and worker processes will be active in the memory.
newZnode = create("/rendezvous/candidateOne", "", EPHEMERAL)
  1. The client can put all worker processes that are started before the manager process to sleep by putting a watch on the newly formed zrz_r.
getData("/rendezvous/candidateOne", true)
  1. When the manager process will be initiated, the manager process will store its details in the created zrz_r. After updating zrz_r, it will notify all the worker processes initiated before the manager because the watch flag is set to true.
setData("/rendezvous/candidateOne", {"10.0.0.1", 8080})
  1. All the worker processes will get the details from zrz_r and start communicating with the manager.

Group membership

We have ensured that multiple clients can perform operations on the shared resources in Zookeeper. We have allowed clients to create groups so similar types of clients can share the workspace. We call this, group membership. For simplicity, we have used ephemeral znodes to implement group membership by using the following steps:

  1. Create a parent znode zgz_g that represents a group, such as member0001, as shown below.
newZnode = create("/GroupMembership/member0001", EHPEMERAL)
  1. Each process belonging to that group will create a child node of its own under the created zgz_g.

    i. Each process stores its information (the IP and port to whom it is listening).

    ii. Each process should give the child node under zgz_g a unique name or can use the SEQUENTIAL flag to generate a unique name, such as processOne, as shown below.

newZnode = create("/GroupMembership/member0001/processOne", EPHEMERAL,SEQUENTIAL)
  1. We can get the information of the group by simply listing all the children of that zgz_g​ , using the getChildren(path, watch) method. We can set the watch flag to true to monitor any changes in the group.
childrenList = getChildren("/GroupMembership/member0001", true)

Locks

While Zookeeper does not provide explicit locking mechanisms, clients can use its API to implement a locking service. Many applications use ZooKeeper for synchronization according to their requirements. Keeping the aim (simple implementation) in mind, ZooKeeper allows its clients to implement locks simply.

The client process uses the following steps to implement the locks:

  1. Create a znode, zlz_l, with the EPHEMERAL flag.
newZnode = create("/locks/lock-1", EPHEMERAL)
  1. If zlz_l is not created, the client process puts a watch on the locks.
exists("/locks", true)
  1. If zlz_l is created successfully, it means the lock has been acquired.

  2. The client process releases the lock either when the client is disconnected, or zlz_l is deleted.

delete(newZnode)
  1. Once zlz_l is deleted, all the clients’ processes waiting for the lock to be released will be notified because of the Step 2 and they will now try to acquire it.

Since simplicity comes with some tradeoffs, our proposed solution has the following issues:

  1. It causes a herd effect. As the name suggests, once the lock is released in Step 5, multiple clients want to acquire the lock.
  2. It only implements exclusive locksThese are locks that can be acquired only by one process at a time. For example, the lock for the write operation can be acquired only by one writer at a time..

Let’s discuss the solution to the issues mentioned above.

Simple locks without the herd effect

We have made the following changes to improve locking:

  1. When creating the znode, zlz_l, we have used the SEQUENTIAL flag along with EPHEMERAL, which will help us order each cleint’s attempt to acquire the lock, as shown in line 1 of the Lock procedure.

  2. Once the clients are numbered by their attempts, the client with the lowest sequence number will hold the lock, as shown in line 4 of the Lock procedure. The clients who don’t hold the lock have to wait for the lock to be released.

  3. Instead of notifying all the clients, it is better to inform the client who is next in line and check whether or not it is holding the lock, as shown in lines 6 to 8 of the Lock procedure.

Level up your interview prep. Join Educative to access 80+ hands-on prep courses.