We provided a solution to the consensus problem and we’ve seen that it’s possible to achieve a Byzantine agreement. However, the solutions so far rely upon the system being synchronous, whereas the algorithms assume that the message exchanges occur stepwise in rounds.

However, the synchrony assumptions are an essential property of the system model that massively affects the efficiency and functionality of the protocols. In this section, we consider asynchronous message-passing systems. We’ll see that the loss of upper bounds on message transmission times has a dramatic effect on the protocols.

Impossibility in asynchronous systems

In synchronous systems, crashing failures can be detected since there’s a known upper bound on the time of execution of a process and the message delays. But when the system is not synchronous, a known upper bound on the time doesn’t exist, thus execution and message exchanges may take arbitrarily long. This leads to an inherent impossibility of determining whether a process has actually crashed or is only very slow.

Judmayer et al. (2017)Aljosha Judmayer, Nicholas Stifter, Katharina Krombholz, Edgar Weippl, Elisa Bertino, and Ravi Sandhu. Blocks and Chains: Introduction to Bitcoin, Cryptocurrencies, and Their Consensus Mechanisms. Synthesis Lectures on Information Security, Privacy, and Trust. San Rafael, CA, 2017. Morgan & Claypool. describe this as follows: “Effectively, without bounded delays on message transmission times, it’s impossible to deterministically decide whether a process has failed, or its messages have simply not yet arrived. To ensure that the Agreement property of consensus under such condition can’t be violated, the Termination property is no longer satisfiable, as a single failed process could require all correct processes to wait indefinitely for an answer.”

These facts lead to the so-called FLP impossibility result from Fischer et al. (1985)Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. J. A C M, 32(2): 374-82, April 1985., who proved that no deterministic algorithm can guarantee to reach consensus in an asynchronous system with the possibility of one single crash failure. This means that reaching a deterministic agreement in an asynchronous system is impossible, even with only one faulty process (in the fail-stop model).

Theorem 1: FLP (Fischer, Lynch, and Paterson) impossibility result

In a fully asynchronous environment, there’s no deterministic 1-crash resilient algorithm that guarantees a solution to the consensus problem.

Note: The impossibility result doesn’t state that deterministic consensus is never attainable in asynchronous systems. According to Birman (2007)Ken Birman. Consensus, impossibility results and Paxos. https://www.cs.cornell.edu/ courses/cs614/2007fa/Slides/FLP_and_Paxos.pdf, 2007. Accessed: 2017-02-01., an algorithm is totally correct in formal proofs if “it always terminates. […] FLP proves that any fault-tolerant algorithm solving consensus has runs that never terminate. These runs are extremely unlikely (‘probability zero’). Yet they imply that we can’t find a totally correct solution. ‘Consensus is impossible’ thus means 'consensus is not always possible”.

Thus, no completely asynchronous deterministic consensus protocol can tolerate even one single crash in the fail-stop model, and thus no Byzantine failure as well. In fact, this result doesn’t mean that consensus in an asynchronous system is impossible at all, i.e. it doesn’t mean that processes can never reach consensus if one is faulty. Fischer, Lynch, and PatersonMichael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2): 374-82, April 1985. recognized this fact and hence wrote:

“We have shown that a natural and important problem of fault-tolerant cooperative computing can’t be solved in a totally asynchronous model of computation. These results don’t show that such problems can’t be “solved” in practice; rather, they point up the need for more refined models of distributed computing that better reflect realistic assumptions about processor and communication timings, and, for less stringent requirements on the solution to such problems.”

We’ll give a short overview of the main techniques for solving consensus in asynchronous systems in practice in the next section.

Circumventing the FLP impossibility result

Since the FLP impossibility result only applies to deterministic algorithms, it led to intense research of extensions to the base model in order to circumvent the FLP result. These extensions can be divided into four classes (James Aspnes (2003)James Aspnes. Randomized protocols for asynchronous consensus. Distrib. Comput., 16(2-3): 165-75, September 2003., Miguel Correia et al. (2011)Miguel Correia, Giuliana Santos Veronese, Nuno Ferreira Neves, and Paulo Verissimo. Byzantine consensus in asynchronous message-passing systems: a survey. Int. J. Crit. Comput.-Based Syst., 2(2):141-61, July 2011. and Aljosha Judmayer et al. (2017)Aljosha Judmayer, Nicholas Stifter, Katharina Krombholz, Edgar Weippl, Elisa Bertino, and Ravi Sandhu. Blocks and Chains: Introduction to Bitcoin, Cryptocurrencies, and Their Consensus Mechanisms. Synthesis Lectures on Information Security, Privacy, and Trust. San Rafael, CA, 2017. Morgan Claypool.):

  1. Randomization
  2. Partial synchrony assumptions
  3. Failure detectors
  4. Hybridization and wormholes

Before outlining these classes, we first answer the question about optimal resilience in fully asynchronous systems. We’ve already seen that reaching consensus is quite easy in synchronous systems. Namely, consensus with crash failures is possible with any number of faulty processes, meaning that n>fn>f (cf. this lesson). For Byzantine processes, the same is possible if we use authenticated messages (theorem :Authenticated_synchronous_agreement ), otherwise, the Byzantine agreement without authentication requires f<n/3f<n / 3 faulty processes (theorem:Lower_bound_number_of_processes ).

However, reaching consensus in an asynchronous system requires much more complex algorithms than their deterministic counterparts. A direct consequence of this fact is lower optimal resiliencies.

Theorem 2: upper bound on the number of faulty processes for crash failures

“There is no consensus algorithm for the asynchronous model that tolerates fn/2f \geq n / 2 many fail-stop failures”.

Proof

Consider this research paperRoger Wattenhofer. Distributed systems. https://disco.ethz.ch/courses/distsys/, 2017. Accessed: 2017-01-02. for proof.

This means that f<n/2f<n / 2 is the optimal fault-tolerance in the asynchronous case for fail-stop failures. As a consequence, this defines the upper bound on the number of faulty processes for any kind of failure, meaning there are 2f+12 f+1 processes needed, hence a majority of processes have to be correct at least in order to achieve agreement.

Tolerating Byzantine failures increases the complexity of the protocols and hence also the number of correct processes needed. Toueg (1983) showed that Byzantine agreement in asynchronous message-passing systems requires at least 3f+13 f+1 processes, stating the following theorem:

Theorem 3: upper bound on the number of faulty processes for Byzantine failures

There are no Byzantine agreement protocols for asynchronous systems where f>n/3f>n / 3. This holds even if message authentication is assumed.

In particular, the Byzantine agreement requires f<n/3f<n / 3 with randomization, failure detectors or partial synchrony, or asynchrony even with authentication, but can be solved with f<n/2f<n / 2 by using a hybrid system model (Miguel Correia et al. (2010)Miguel Correia, Giuliana S. Veronese, and Lau Cheuk Lung. Asynchronous Byzantine consensus with 2 f+1 processes. In Proceedings of the 2010 ACM Symposium on Applied Computing, SAC’10, pages 475-80, New York, NY, USA, 2010. ACM.).

Get hands-on with 1400+ tech skills courses.