Issues in Single Leader Replication
Leader-based replication is an ideal choice for read-scaling scenarios where the read requests processed by a distributed system are far more than the number of write requests. This is often true of internet applications. The number of followers can be increased as the read load on the system increases.
However, leader-based replication has its limitations.
All the writes process through the leader which becomes a bottleneck as the number of writes increases.
If the leader is down for any reason e.g. network outage, writes to the system are halted.
The leader-based replication architecture can scale for read requests when asynchronous replication is used. As the number of followers in the system increases so does the probability of at least one of them failing. Thus synchronous replication can potentially make the system unavailable for writes if any one of the follower experiences failure or a network outage hits the system.
Another limitation surfaces when an application tries to read from an asynchronous follower that has fallen behind. Running the same query on the leader versus a follower which is behind will yield different results since the state of the follower hasn’t caught up with that on the master. The state on the follower isn’t consistent with that on the master. The delay between a write happening on the master and the write being reflected on the follower is known as the replication lag. The follower does catch-up eventually with the leader and this phenomenon is called eventual consistency. However, it doesn’t specify when the follower catches up? Is it a few seconds or minutes? The term eventual consistency is deliberately vague on how long the follower may take to catch up to the leader. In practice it could be a fraction of a second.
Read-after-write consistency
Consider the scenario of a Twitter user making a new tweet. In a leader-based replication scenario, the tweet goes to the leader to be committed to the underlying data store. If the user refreshes his timeline, it is possible that the read request may go to a follower that is behind the leader and may not return the ...