Agreement in Asynchronous Systems
Learn about the Byzantine agreement for asynchronous systems.
We'll cover the following
- Impossibility in asynchronous systems
- Circumventing the FLP impossibility result
We provided a solution to the consensus problem and we’ve seen that it’s possible to achieve a Byzantine agreement. However, the solutions so far rely upon the system being synchronous, whereas the algorithms assume that the message exchanges occur stepwise in rounds.
However, the synchrony assumptions are an essential property of the system model that massively affects the efficiency and functionality of the protocols. In this section, we consider asynchronous message-passing systems. We’ll see that the loss of upper bounds on message transmission times has a dramatic effect on the protocols.
Impossibility in asynchronous systems
In synchronous systems, crashing failures can be detected since there’s a known upper bound on the time of execution of a process and the message delays. But when the system is not synchronous, a known upper bound on the time doesn’t exist, thus execution and message exchanges may take arbitrarily long. This leads to an inherent impossibility of determining whether a process has actually crashed or is only very slow.
These facts lead to the so-called FLP impossibility result from
Theorem 1: FLP (Fischer, Lynch, and Paterson) impossibility result
In a fully asynchronous environment, there’s no deterministic 1-crash resilient algorithm that guarantees a solution to the consensus problem.
Note: The impossibility result doesn’t state that deterministic consensus is never attainable in asynchronous systems. According to
, an algorithm is totally correct in formal proofs if “it always terminates. […] FLP proves that any fault-tolerant algorithm solving consensus has runs that never terminate. These runs are extremely unlikely (‘probability zero’). Yet they imply that we can’t find a totally correct solution. ‘Consensus is impossible’ thus means 'consensus is not always possible”. Birman (2007) Ken Birman. Consensus, impossibility results and Paxos. https://www.cs.cornell.edu/ courses/cs614/2007fa/Slides/FLP_and_Paxos.pdf, 2007. Accessed: 2017-02-01.
Thus, no completely asynchronous deterministic consensus protocol can tolerate even one single crash in the fail-stop model, and thus no Byzantine failure as well. In fact, this result doesn’t mean that consensus in an asynchronous system is impossible at all, i.e. it doesn’t mean that processes can never reach consensus if one is faulty.
“We have shown that a natural and important problem of fault-tolerant cooperative computing can’t be solved in a totally asynchronous model of computation. These results don’t show that such problems can’t be “solved” in practice; rather, they point up the need for more refined models of distributed computing that better reflect realistic assumptions about processor and communication timings, and, for less stringent requirements on the solution to such problems.”
We’ll give a short overview of the main techniques for solving consensus in asynchronous systems in practice in the next section.
Circumventing the FLP impossibility result
Since the FLP impossibility result only applies to deterministic algorithms, it led to intense research of extensions to the base model in order to circumvent the FLP result. These extensions can be divided into four classes (
- Randomization
- Partial synchrony assumptions
- Failure detectors
- Hybridization and wormholes
Before outlining these classes, we first answer the question about optimal resilience in fully asynchronous systems. We’ve already seen that reaching consensus is quite easy in synchronous systems. Namely, consensus with crash failures is possible with any number of faulty processes, meaning that (cf. this lesson). For Byzantine processes, the same is possible if we use authenticated messages (theorem
However, reaching consensus in an asynchronous system requires much more complex algorithms than their deterministic counterparts. A direct consequence of this fact is lower optimal resiliencies.
Theorem 2: upper bound on the number of faulty processes for crash failures
“There is no consensus algorithm for the asynchronous model that tolerates many fail-stop failures”.
Proof
Consider this
This means that is the optimal fault-tolerance in the asynchronous case for fail-stop failures. As a consequence, this defines the upper bound on the number of faulty processes for any kind of failure, meaning there are processes needed, hence a majority of processes have to be correct at least in order to achieve agreement.
Tolerating Byzantine failures increases the complexity of the protocols and hence also the number of correct processes needed. Toueg (1983) showed that Byzantine agreement in asynchronous message-passing systems requires at least processes, stating the following theorem:
Theorem 3: upper bound on the number of faulty processes for Byzantine failures
There are no Byzantine agreement protocols for asynchronous systems where . This holds even if message authentication is assumed.
In particular, the Byzantine agreement requires with randomization, failure detectors or partial synchrony, or asynchrony even with authentication, but can be solved with by using a hybrid system model (
Get hands-on with 1400+ tech skills courses.