Detailed Design of Chubby: Part IV

Learn about fault-tolerance, scalability, and availability of Chubby's design.

Failovers

Nodes can fail, and we need to have an excellent strategy to minimize downtime. One way to reduce downtime is a fast failover. Let’s discuss the failover scenario and how our system handles such cases.

A primary replica discards its state about sessions, handles, and locks when it fails or loses leadership. This must be followed by the election of a new primary replica with the following two possible options:

  1. If a primary replica gets elected rapidly, clients can connect with the new primary replica before their own approximation of the primary replica lease’s (local lease) duration runs out.
  2. If the election extends for a longer duration, clients discover the new primary replica by emptying their caches and waiting for the grace period. As a result, our system can keep sessions running during failovers that go past the typical lease time-out, thanks to the grace period.

Once a client has made contact with the new primary replica, the client library and the primary replica give this impression to the application that nothing went wrong by working together. The new primary replica must recreate the in-memory state of the primary replica that it replaced. It accomplishes this in part by:

  • Reading data that has been duplicated via the standard database replication technique and kept safely on disk
  • Acquiring state from clients
  • Making cautious assumptions
Press + to interact
The new primary replica recreates an approximation of the in-memory state of the old primary replica
The new primary replica recreates an approximation of the in-memory state of the old primary replica

Note: All sessions, held locks, and ephemeral files are recorded in the database.

Newly elected primary replica proceedings

The proceedings of the newly elected primary replica are shown in the following slides.


Quiz

After analyzing the proceedings of the newly elected primary replica shown in the slides above, answer the following questions:

  1. In slide 8 of the slide deck above, how does the new primary replica prevent the recreation of an already closed handle? 

  2. Why does the client library struggle to keep the old session going? Why not simply make a new session when we can contact the new primary replica?

Primary Replica in Chubby Locking

Example scenario

The following illustration depicts the progression of an extended primary replica failover event where the client must use its grace time to keep the session alive. The time grows from top to bottom, but we won’t scale it since it’s not linear.

Note: Arrows from left to right show KeepAlive requests, and the ones from right to left show their replies.

Note: During the grace period, the client is uncertain as to whether its lease at the primary replica has ended at this time. It does not end its session, but it does stop all API requests from applications to stop them from seeing erroneous data.

Point to ponder

1.

How did Google implement Chubby’s database?

Show Answer
Q1 / Q1
Did you find this helpful?

Backup

Each Chubby cell’s primary replica takes a snapshot of its database every few hours and uploads it to a GFS file server located in a separate building. Using a different site guarantees that the backup in GFS will endure any damages on a single site and that the backups do not incur cyclic dependencies into the system; otherwise, a GFS cell deployed at a single site may depend on the Chubby cell to choose its primary replica.

Press + to interact
Chubby's backup strategy
Chubby's backup strategy

Backups offer both disaster recovery and a way to set up the database of a replacement replica without putting additional stress on active replica servers.

Mirroring

Chubby enables mirroringA technique that allows a system to automatically maintain multiple copies. ...