...

/

Protocols for Maintaining Fault Tolerance: Part I

Protocols for Maintaining Fault Tolerance: Part I

Learn how to keep enough replicas in an ensemble of replicas to maintain fault tolerance.

Our protocols for tt fault tolerance in a system provide us with a guarantee that our system will not fail if no more than tt replicas fail. With this guarantee, we must ensure that the number of faulty nodes in an ensemble of replicas does not exceed tt. We can do this by replacing faulty replicas with non-faulty replicas. Let's formally discuss this.

Modeling replica replacement

We define P(τ)P(\tau) as the total number of nodes running state machine replicas in an ensemble of replicas and F(τ)F(\tau) as the number of faulty nodes in that ensemble at time tautau. P(τ)F(τ)P(\tau) - F(\tau) must be greater than a certain number to guarantee that our system will produce the correct output. Here is how we can formally define this combining condition:

Here, Enuf=P(τ)/2 Enuf = P(\tau)/2 when Byzantine failures are possible. And Enuf=0Enuf = 0 when only fail-stop failures are possible.

If the condition above holds, our system will provide the correct output. This is ensured by having the minimum number of non-faulty nodes present in the system, depending on the respective failure types. For Byzantine failures, we need a majority, which means more than half of the total nodes. Therefore, any integer greater than P(τ)/2P(\tau)/2. We only need one non-faulty node for fail-stop failures, which ...