Enabling Fault Tolerance and Failure Detection clone
Learn how we will make key-value store fault-tolerant and able to detect failure.
Handling temporary failures
Typically, distributed systems use a quorum-based approach to handle failures. A quorum is the minimum number of votes required for a distributed transaction to proceed with an operation. If a server is part of the consensus and it becomes down, then we cannot perform the required operation. It affects the availability and durability of our system.
We will use a “sloppy quorum” instead of n
healthy nodes from the preference list handle all read and write operations. The n
healthy nodes may not always be the first n
nodes discovered when moving clockwise in the consistent hash ring.
Let’s consider the following configuration with n
= 3. If node A
is briefly unavailable or unreachable during a write operation, the request will be sent to the next healthy node from the preference list, which is node ...
Create a free account to access the full course.
By signing up, you agree to Educative's Terms of Service and Privacy Policy