System Design Deep Dive: Real-World Distributed Systems/

...

Consensus Prerequisites and Two Generals' Problem

Learn about the fundamentals of the consensus problem.

We'll cover the following...

Consensus
Computational models
Faults
The Two Generals’ Problem
- Two Generals' Problem applied in real life

This chapter lays the foundation of the consensus problem in distributed systems. By consensus, we mean that a set of processes start a protocol where each of them has a choice of possible decisions, and all of them reach a common decision by exchanging network messages. Achieving consensus is easy if there are no failures (network or node failures). However, achieving consensus under different kinds of failures becomes challenging (or even impossible). While extensive research has been done to solve the consensus problem, we have picked three celebrated results for discussion in this chapter. These results are as follows:

The Two Generals’ Problem
The FLP impossibility
The Byzantine Generals Problem

We believe the above results give us ample background for the rest of the chapters in this course.

In this lesson, we will define the consensus problem more formally and review two important system models (synchronous and asynchronous) and different kinds of faults (crash-stop and Byzantine faults). After that, we’ll discuss the Two Generals’ Problem.

Consensus

We need consensus for many computational tasks. Let's take the examples of distributed databases and data replication.

The use of databases is commonplace, and many rely on the abstraction of transactions to work correctly. A database transaction is involved when we purchase goods or services from any online store or conduct a bank withdrawal or deposit. Transactions are impossible without a consensus where, for example, money is deducted from one account and deposited into another as one logical unit. The consensus here involves either all parties moving ahead with the transaction and committing it, or none of the parties moving ahead and aborting the translation.

The second example is data replication to increase durability under data losses or corruption threats. Each copy of data must remain consistent at all places, even when we allow data mutations. Doing so is only possible when all replicas have a consensus on specific values.

Formally, the consensus among $n$ processes that can choose a value from a permissible set of values is to satisfy the following three conditions:

Validity: If every nonfaulty process starts with an initial value $v$ ...

Prologue

File Systems

Google File System (GFS)

Google Colossus File System

Facebook's Tectonic File System

Databases

Google Bigtable

Google Megastore

Google Spanner

Key-value Stores

Many-core Key-value Store

Scaling Memcache

SILT

Amazon DynamoDB

Concurrency Management

Two-phase Locking (2PL)

Google Chubby Locking Service

ZooKeeper

Big Data Processing: Batch to Stream Processing

MapReduce

Spark

Kafka

Consensus

Understanding Consensus: Two Generals, FLP, & Byzantine Generals

Two-phase Commit

State Machine Replication

Paxos

Raft

Epilogue

Consensus Prerequisites and Two Generals' Problem

Consensus