Evaluation of Chubby
Let's evaluate Chubby's design.
Availability
One of the main requirements of Chubby’s design was ensuring high availability. Even though it’s of prime importance, we should always consider failure probability and avoid approaching services like Chubby as if they were always available.
The global Chubby cell is usually always online since it is uncommon for more than two geographically distant data centers to be down simultaneously. However, a client’s observed availability is frequently different from the global availability for the following two reasons.
- The local cell is usually not partitioned from the client.
- The local cell can be temporarily down, which can happen due to maintenance, which directly impacts the client. Therefore, the client doesn’t notice Chubby’s unavailability.
We can employ three methods to be realistic about Chubby’s availability achievement, particularly that of the global cell.
- The way we end up utilizing Chubby in an application has a significant impact on its availability. It’s necessary to keep the applications’ availability somewhat independent of Chubby’s availability to gain better availability out of the system.
- We can rely on additional libraries executing various high-level activities, isolating application developers from Chubby’s downtime.
- We can analyze each Chubby malfunction to find ways to prevent similar issues in the future and make applications less reliant on Chubby, ultimately leading to an increase in the overall system’s availability.
Let’s also evaluate how Chubby handles different failure scenarios to ensure availability.
Primary replica failure
If a primary replica fails, other replica servers will just wait till the expiration of the primary replica lease and then elect a new primary replica from among themselves using the consensus protocol. The primary replica lease is kept short, just a few seconds, to ensure that replica servers don’t have to wait too long to elect a new primary replica in case it fails. ...