What Is Consensus?

Learn about distributed consensus.

Introduction

A distributed system is a group of hosts coordinating to achieve a common goal. These hosts can be spatially separated from each other, but for the client, it is a single view of the entire system.

The following properties are considered essential to a distributed system:

  • Concurrent: Hosts in the distributed system execute concurrently, processing multiple operations simultaneously as other hosts require coordination.

  • Sequencing of operations: The core property of a distributed system is to sequence the order of events since it is impossible to determine which event occurred before the other in a concurrent environment. Since there is no global clock to synchronize, various algorithms have been proposed to determine the order of events.

  • Unreliable networks: In a distributed system with multiple hosts connected via networks, network failures and partitions are inevitable. Packets can be lost, reordered, or duplicated. The hosts can go down and respond slowly. Therefore, designing distributed systems requires building fault-tolerant systems where the failure of one component is handled gracefully and doesn’t cascade across the hosts.

  • Communication: Hosts in distributed systems communicate with each other through message passing, which can be synchronous or asynchronous. In synchronous communication, messages will be delivered within a fixed period, whereas message delivery semantics are not deterministic in asynchronous communication.

Consensus

Consensus in a distributed system is the mechanism by which distributed processes or hosts agree on a particular decision. In a distributed environment with multiple hosts, reliability is a core requirement in the presence of network failures. Therefore, multiple hosts coordinate to perform an action that requires the coordinating hosts to agree on some data value. Some examples of consensus include:

  • Leader election: In a distributed database, multiple hosts should agree on who should be the leader node so that only one leader to receive writes exists.

  • Ticket reservation system: A flight ticket booking system receives multiple requests to reserve a particular seat, and the database can allocate the seat for only one client.

  • Username uniqueness: A web application receives multiple requests from different users to create a unique username. The database has to ensure a unique username assignment globally.

Every host in the consensus algorithm plays one of the following roles:

  • Proposers: Proposers are leaders who determine the outcome of the proposal request.

  • Acceptors: Acceptors accept proposed values from proposers and let proposers know if something got accepted previously.

  • Learners: Learners are hosts who learn the outcome.

Replicated state machine

Solving consensus problems puts us one step away from implementing a replicated state machine. A replicated state machine is a deterministic state machine replicated across multiple hosts but which provides a view of a single state machine. Even if one of the hosts is faulty, the state machine remains functional and deterministic.

A valid atomic transaction in a replicated state machine causes the system's state to transition from one state to the next based on the input parameters.

  • A transaction log is a sequence of transactions stored in the state machine.

  • The rules for transitioning the transactions from one state to another are called state transition logic or state machine.

We can think about replicated state machines as a set of distributed hosts that start with the same initial value. Then, for each state transition, the hosts decide on the next value and collectively agree on a value (i.e., reach a consensus).

Get hands-on with 1300+ tech skills courses.