Quorums in Distributed Systems
Look at the concept of quorums and see how they solve low availability problems in synchronous replication.
We'll cover the following...
The main pattern we’ve seen so far is this: writes are performed to all the replica nodes, while reads are performed to one of them. When we ensure that writes are performed to all of them synchronously before replying to the client, we guarantee that the subsequent reads see all the previous writes—regardless of the node that processes the read operation.
Note that, in the above paragraph, we have used the term “performed”. While one node receives a write request in either of the replication algorithms discussed earlier, the data is updated on all the nodes as a result of this request. Similarly, when a node receives a read request, it reads it from its local storage rather than performing a read on all the nodes. In the case of multi-primary replication, reads may be performed on all the nodes to handle write conflicts, but that is one possible solution and can’t be generalized as a pattern.
The problem in synchronous replication
Availability is quite low for write operations, because the failure of a single node makes the system unable to process writes until the node recovers.
Possible solution
To solve this problem, we can use the reverse strategy. That is, we write data only to the node that is responsible for processing a write operation, but process read operations by reading from all the nodes and returning the latest value.
This increases the availability of writes significantly but decreases the availability of reads at the same time. So, we have a trade-off that needs a mechanism to achieve a balance. Let’s see that mechanism.
Quorums
A useful mechanism to achieve a balance in this trade-off is to use quorums.
Let’s consider an example. In a system of three replicas, we can say that writes need to complete in two nodes (as a quorum of two), while reads need to retrieve data from two ...