Detailed Design of ZooKeeper
Understand the design of ZooKeeper in detail and learn more about its components.
We'll cover the following...
We have discussed the ZooKeeper architecture at a high level in the previous lesson. Now it’s time to learn more about its components, the client API and the ZooKeeper servers, and how they interact.
Client API
The client API provides a set of functions allowing the client to communicate with the ZooKeeper server. The following are the functions/methods provided in the client API:
create (path, data[], mode, flag)
: This method creates a .znode A ZooKeeper in-memory node on which data is being stored. path
specifies the location in the coordination data tree at which the znode is to be created. For example,/app1/
in the coordination data tree maintained in the ZooKeeper server’s memory is a path at which we can create a znode.data[]
specifies the data to be stored in the created znode, andmode
allows the client to choose whether the znode should be regular or ephemeral.create()
returns the name of the created znode. If the sequentialflag
is set, the number from the counter will be appended to the name of the znode.monotonically increasing A set of numbers that are only increasing.
The regular mode gives full control of the znode to the client. Only the client can create and delete such znodes. These znodes exist even after the client is disconnected. All the znodes in the ZooKeeper are, by default, regular unless specified otherwise. The ephemeral mode is created by the client, but the system has the right to delete it if the session has expired.
-
setData (path, data[], version)
: This method sets thedata[]
in the znode at the specifiedpath
and with the specifiedversion
. -
getData(path, watch)
: This method returnsdata[]
, which was set throughsetData()
and the metadata of the znode at the specifiedpath
. Thewatch
flag allows the client to set a on that znode.watch A notifying mechanism which notifies the registered client if any data has been updated in that znode so that the client can get updated data. A client can only register for watches during the read operation. -
getChildren (path, watch)
: This method returns all the children names of the znode at the specifiedpath
, and thewatch
flag allows the client to set a watch on that znode’s children. -
exists (path, watch)
: This method checks whether there exists a znode at the specifiedpath
and returns its metadata information. Thewatch
flag allows the client to set a watch on that znode.
The
watch
flag will work only if the znode at the specified path exists. Watches are applicable only for the get andexists()
methods.
-
delete (path, version)
: This method deletes the znode at the givenpath
of the givenversion
. If any of the parameters don’t match, then the znodes will not be deleted. -
sync (path)
: This method waits for all the requests on the znode of thepath
to be completed, which are generated by the client connected to the server.
Note: By using these functions, one has the ability to develop a range of coordination artifacts.
ZooKeeper offers znodes as shared registers with watches as an event-driven method that is comparable to the distributed system’s cache invalidation. ZooKeeper provides a simple yet powerful coordination service in this way.
Server
The ZooKeeper service is replicated, which means that all of its data is kept on a single server, and the same data is replicated on other servers to deal with the single point of failure issue. It distributes the load of requests and provides service availability at each server. The collection of these replicated servers is called the ZooKeeper ensemble. All these servers work together to provide services to the client. One server is elected as the leader, and the others become the followers.
Unlike Chubby, clients are not bound to only connect with the leader, called the primary replica in Chubby, but they can also connect with the followers (called replicas in Chubby) to perform operations. This design decision provides ZooKeeper to have high availability but does not provide strong consistency until clients go to the leader for reads as well. The sync()
method discussed above can be used to perform synchronization but on a need basis. The leader and the followers differ in their roles as follows:
- The leader: The leader, on receiving a client’s write request, broadcasts the operation to the followers, performs the write operation on the coordination data placed in its memory, and acknowledges the client.
- The follower: The follower can also receive and respond to write requests. Multiple write requests are queued in the server so that they can be executed in the FIFO order. However, only the write request needs to be forwarded to the leader, and the leader broadcasts the request to all other followers. After broadcasting the request to the followers, the leader responds to the follower who forwarded the write request to it. Then, that follower replies to the client’s write request. The broadcasting of the write requests ensures that each server has eventually the same data to show the client. For read requests, the follower doesn’t need to forward the request to the leader and