Throughput

In this lesson, we’ll evaluate the design of our Zookeeper. To keep it simple, we’ll discuss the throughput of read/write requests, load management, atomic broadcast, and system failure. However, before moving forward, let’s discuss the system’s requirements used for the evaluation.

Note: The experiment and results are taken from Hunt, Patrick, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. “ZooKeeper: Wait-free Coordination for Internet-scale Systems.” In 2010 USENIX Annual Technical Conference (USENIX ATC 10). 2010.

System specification

For the evaluation of the system, the number of servers was changed, but the clients were always $250$ . Even though we have implemented the clients in both Java and C, we have used Java servers and asynchronous Java clients in the experiment. Each client has a minimum of $100$ requests for the server, where each request of either read or write will be done on the data of $1$ KB. Since the performance of all the requests, such as create(), setData(), getData(), and many more, excluding sync() is approximately the same, we won’t explicitly discuss these functions. To keep the session active, the client sends the count of completed operations after every $300\ ms$ , and we have recorded the status after every $6s$ . To ensure that the server doesn’t get overwhelmed, we have added a throttle for requests which are concurrent $2K$ requests, as shown in the table below.

Read/write requests

As discussed above, the system was tested with a different number of servers by changing the read-to-write ratio. As compared to write throughput, read throughput is higher because there is no atomic broadcasting in read operations. For example, the write throughput of $3$ servers is $21k$ , less than the read throughput, which is $87k$ , as shown in the table belowSource:Hunt, Patrick, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. “ZooKeeper: Wait-free Coordination for Internet-scale Systems.” In 2010 USENIX Annual Technical Conference (USENIX ATC 10). 2010.

The following are the two reasons why the write request takes more time than the read request:

Broadcasting overhead for the write requests, which the read request doesn’t have to do
Logging the transaction to disk before notifying the leader

The atomic broadcasting seems excessive, but the goal is to achieve reliability so we can trade performance with it. However, increasing the number of servers affects the performance of broadcasting, and helps the system be fault tolerant. For example, the write throughput of $3$ servers ( $21k$ ) is greater than the write throughput of $13$ ...

Attributes	Values
Number of clients	250
Language	Java
Requests per client	100 at least
Request on data size	1 KB
Maximum concurrent request executions	2K at most

Servers	100% Reads	100% Writes
13	460k	8k
9	296k	12k
7	257k	14k
5	165k	18k
3	87k	21k

Prologue

File Systems

Google File System (GFS)

Google Colossus File System

Facebook's Tectonic File System

Databases

Google Bigtable

Google Megastore

Google Spanner

Key-value Stores

Many-core Key-value Store

Scaling Memcache

SILT

Amazon DynamoDB

Concurrency Management

Two-phase Locking (2PL)

Google Chubby Locking Service

ZooKeeper

Big Data Processing: Batch to Stream Processing

MapReduce

Spark

Kafka

Consensus

Understanding Consensus: Two Generals, FLP, & Byzantine Generals

Two-phase Commit

State Machine Replication

Paxos

Raft

Epilogue

Evaluation of ZooKeeper

Throughput

System specification

System Specification for Testing

Read/write requests

Throughput Performance at Extremes of System