System Design Deep Dive: Real-World Distributed Systems/

...

Raft's Safety, Fault-Tolerance, and Availability Protocols

Let's learn how Raft ensures safety, handles leader and followers' crashes, and maintains availability.

We'll cover the following...

Safety
Quiz
Follower and candidate crashes
Timing and availability

Safety

The previous lessons discussed how Raft selects leaders and replicates log entries. Still, additional mechanisms are needed to guarantee that every state machine executes the same commands in the same order. To see why this is the case, take an example of a follower that misses several log entries while the leader commits them. Such a follower can become the new leader and can overwrite the committed entries with new ones, resulting in different state machines executing different sequences of commands. The following slides show such a scenario:

Press + to interact

To address this issue, the Raft algorithm restricts which servers can be elected as leaders to ensure that the leader for any given term contains all the entries committed in previous terms.

Election restriction

In leader-based consensus algorithms, the leader is responsible for eventually storing all committed log entries. However, some algorithms allow a leader to be elected without initially having all committed entries. These algorithms require additional mechanisms to identify and transmit missing entries to the new leader during or after the election process, leading to increased complexity.

The following table enlists the two election restrictions set by Raft, along with their rationale.

Restriction	Rationale
Raft ensures that all committed entries from previous terms are on each new leader from the moment of its election, so there is no need to transfer them afterward.	This ensures that log entries have a unidirectional flow from leaders to followers, and leaders never overwrite their logs' existing entries.
Raft prevents a candidate from winning an election unless all the committed entries are in its log. It achieves this restriction through the voting process. The voter denies the vote's request of a candidate whose log is out-of-date, rather than their own log.	As the voting process dictates, the candidate must contact a majority of the cluster to become a leader. Any majority of the cluster node will have at least one node that has the latest committed data. The `RequestVote` RPC enforces this restriction in Raft by including the information about the candidate's log.

Committing entries from previous terms

The leader of a Raft cluster confirms the commitment of an entry from the current term once it has been stored on most of the servers. However, if the leader crashes before committing an entry, future leaders will try to replicate the entry. Nonetheless, if an entry from a previous term is stored on a majority of the servers, the leader cannot immediately deduce that it has been committed.

Before discussing Raft’s approach to commit log entries from previous terms, let’s discuss the issue of an old log entry stored on the majority of servers potentially getting overwritten by a future leader. This hypothetical scenario is illustrated below:

Press + to interact

Prologue

File Systems

Google File System (GFS)

Google Colossus File System

Facebook's Tectonic File System

Databases

Google Bigtable

Google Megastore

Google Spanner

Key-value Stores

Many-core Key-value Store

Scaling Memcache

SILT

Amazon DynamoDB

Concurrency Management

Two-phase Locking (2PL)

Google Chubby Locking Service

ZooKeeper

Big Data Processing: Batch to Stream Processing

MapReduce

Spark

Kafka

Consensus

Understanding Consensus: Two Generals, FLP, & Byzantine Generals

Two-phase Commit

State Machine Replication

Paxos

Raft

Epilogue

Raft's Safety, Fault-Tolerance, and Availability Protocols

Safety

Election restriction

Committing entries from previous terms