System Design Deep Dive: Real-World Distributed Systems/

...

/

Replication in Megastore

The replication of Megastore gives a unified picture of the data kept in its dependent copies. No matter which replica a client accesses to begin an operation, read or write operations can be started from any replica, and ACID semantics will be retained. Replication is performed per entity group by synchronously replicating the transaction log of the group to a quorum of replicas. Usually, writes need one round of communication between data centers, while reads, in a healthy case, run locally. Megastore guarantees the following for reads:

The read operation will always be based on the write that was acknowledged at last.
Following the completion of a write operation, all future read operations will reflect the modifications made by the write.

Paxos is commonly used by databases to replicate a transaction log, with a distinct Paxos instance utilized for each point in the log. Here are the advantages and disadvantages of Paxos:

Advantage: It handles messages that are delayed or reordered, as well as replicas that fail by simply stopping.
Disadvantage: It results in high latency since it demands multiple rounds of communication.

Due to the drawback mentioned above, Paxos-based real-world systems lower the number of round trips necessary to make the algorithm feasible. First, we will discuss how manager/leader-based systems employ Paxos and then examine how Megastore enhances the efficiency of Paxos in these systems.

Manager-based approaches

Many systems utilize a designated manager through which all reads and writes are routed to reduce latency. The managers’s state is constantly updated because it is involved in all writes. It does not need to communicate with the network to offer reads of the current consensus state.

Disadvantage: Dependence on a manager restricts reading and writing flexibility. It causes a bottleneck. The election of a new manager also slows down the process.

Megastore’s approach

Let’s discuss the improvements and developments in Paxos that made it suitable for Megastore.

Fast reads

Megastore allowed local reads from anywhere. These local reads improve utilization, provide low latencies in all areas, allow for fine-grained read failover, and simplify programming.

Megastore created a service called the Coordinator, which was hosted on servers in each replica’s data center. The coordinator server keeps track of a group of entities for which its replica has observed all the updates made using the Paxos algorithm. The replica contains enough state to serve local reads for entity groups in that set.

Fast writes

Megastore adopts the pre-preparing optimization employed by manager-based techniques to enable quick single-round trip writes. It uses leaders instead of a dedicated manager.

For each log position, Megastore executes a separate Paxos algorithm instance. (It is multi-Paxos. Multi-Paxos can be optimized to skip the preparation phase and directly enter the acceptance phase when the same proposer makes continuous proposals.Source https://www.alibabacloud.com/blog/paxos-raft-epaxos-how-has-distributed-consensus-technology-evolved_597127) The leader for every log position is a ...

Prologue

File Systems

Google File System (GFS)

Google Colossus File System

Facebook's Tectonic File System

Databases

Google Bigtable

Google Megastore

Google Spanner

Key-value Stores

Many-core Key-value Store

Scaling Memcache

SILT

Amazon DynamoDB

Concurrency Management

Two-phase Locking (2PL)

Google Chubby Locking Service

ZooKeeper

Big Data Processing: Batch to Stream Processing

MapReduce

Spark

Kafka

Consensus

Understanding Consensus: Two Generals, FLP, & Byzantine Generals

Two-phase Commit

State Machine Replication

Paxos

Raft

Epilogue