System Design Deep Dive: Real-World Distributed Systems/

...

Evaluation of GFS

Let's recap how the design of GFS fulfills its promised requirements.

We'll cover the following...

Scalability
Availability
Fault tolerance
Durability
Easy management
Throughput
Consistency
Conclusion
- System design wisdom in GFS

The GFS architecture supports storing large files by splitting them into small chunks, which can be stored on multiple storage servers. To end users, it is just like the file is stored as a single unit.

More chunkservers can easily be added to the cluster to store an increasing amount of data. Unlike traditional systems, we don’t need to replace the disks with a larger capacity, which also involves copying data to the newer disk with the larger capacity. With the GFS architecture, admins can add more chunkservers, and the GFS manager can start utilizing them.

GFS stores three copies of each chunk on different chunkservers by default. If one fails, the other can serve. Moreover, the replica placement strategy of GFS also supports availability. The chunkservers are placed into different machine racks. We may spread all the replicas across different chunkservers placed in the same rack. This increases the system’s availability in the event of a chunkserver failure. Still, we risk losing all replicas if the entire rack is damaged or goes offline due to a shared resource, such as power or a network switch.

The GFS manager re-replicates the chunks if one or more of its replicas are lost permanently. This is how GFS makes the data available to the clients. The GFS manager prioritizes the re-replication in the case when two out of three chunk replicas are gone bad. Let's see how the manager makes the metadata available so that client operation could continue at all times.

The GFS manager holds all the metadata that might be needed or can change due to client operations. The single manager could be the single point of failure, and therefore unavailable to the clients in case of failure. GFS checkpoints the metadata state and logs the metadata change operations into an operation log that is placed on the manager's hard disk and remote replicated storage. If the manager undergoes a temporary failure, it restarts in seconds by loading the checkpoint (from the backup storage) and applying the log operations afterward. If just the manager process fails and the manager restarts on the same server, it can use the backup from the local disk. ...

Prologue

File Systems

Google File System (GFS)

Google Colossus File System

Facebook's Tectonic File System

Databases

Google Bigtable

Google Megastore

Google Spanner

Key-value Stores

Many-core Key-value Store

Scaling Memcache

SILT

Amazon DynamoDB

Concurrency Management

Two-phase Locking (2PL)

Google Chubby Locking Service

ZooKeeper

Big Data Processing: Batch to Stream Processing

MapReduce

Spark

Kafka

Consensus

Understanding Consensus: Two Generals, FLP, & Byzantine Generals

Two-phase Commit

State Machine Replication

Paxos

Raft

Epilogue

Evaluation of GFS

Scalability

Availability