...

/

Replication and Coordination of State Machines

Replication and Coordination of State Machines

Learn to replicate state machines with coordination to maintain a fault-tolerant service.

A single state machine will be as fault-tolerant as the node it is running on. Replicating a state machine on multiple nodes can make it tt fault-tolerantA t fault-tolerant system might continue to operate correctly even if more than t nodes fail, but its correct operation cannot be guaranteed beyond failures of t nodes. We need all our replicas to behave similarly for successful replication.

Outputs from replicas

By behaving similarly, we mean producing the same output. All state machines in a group of replicas will produce the same output if the following conditions are satisfied for every replica that runs on a non-faulty node (or processor):

  1. Every replica starts in the same initial state.

  2. Every replica executes the same requests in the same order.

How many replicas?

Since failures are independent, we will assume that a failure can affect at most one node and (as a result) one state machine. The combined output of an ensemble of replicas resulting from replicating a state machine is the output of its tt fault-tolerant state machine.

If nodes can experience Byzantine failures, then for the replica group to be tt fault-tolerant, the following must be true:

  1. The group must have a minimum of 2t+12t+1 state machine replicas.

  2. The group's output must be the output produced by a majority of the replicas in the group.

As long as the failures are no more than tt failures in a tt fault-tolerant replica group, we can find out the correct output.

Points to ponder

1.

In situations when Byzantine failures are possible, how can we determine if tt nodes have failed?

Show Answer
Q1 / Q2

If nodes can only experience fail-stop failures, then we need at least t+1t+1 replicas to be t fault-tolerant. Since such ...

Access this course and 1400+ top-rated courses and projects.