Replication and Coordination of State Machines

Learn to replicate state machines with coordination to maintain a fault-tolerant service.

A single state machine will be as fault-tolerant as the node it is running on. Replicating a state machine on multiple nodes can make it tt fault-tolerantA t fault-tolerant system might continue to operate correctly even if more than t nodes fail, but its correct operation cannot be guaranteed beyond failures of t nodes. We need all our replicas to behave similarly for successful replication.

Outputs from replicas

By behaving similarly, we mean producing the same output. All state machines in a group of replicas will produce the same output if the following conditions are satisfied for every replica that runs on a non-faulty node (or processor):

  1. Every replica starts in the same initial state.

  2. Every replica executes the same requests in the same order.

How many replicas?

Since failures are independent, we will assume that a failure can affect at most one node and (as a result) one state machine. The combined output of an ensemble of replicas resulting from replicating a state machine is the output of its tt fault-tolerant state machine.

If nodes can experience Byzantine failures, then for the replica group to be tt fault-tolerant, the following must be true:

  1. The group must have a minimum of 2t+12t+1 state machine replicas.

  2. The group's output must be the output produced by a majority of the replicas in the group.

As long as the failures are no more than tt failures in a tt fault-tolerant replica group, we can find out the correct output.

Level up your interview prep. Join Educative to access 80+ hands-on prep courses.