Evaluation of GFS
Let's recap how the design of GFS fulfills its promised requirements.
Scalability
The GFS architecture supports storing large files by splitting them into small chunks, which can be stored on multiple storage servers. To end users, it is just like the file is stored as a single unit.
More chunkservers can easily be added to the cluster to store an increasing amount of data. Unlike traditional systems, we don’t need to replace the disks with a larger capacity, which also involves copying data to the newer disk with the larger capacity. With the GFS architecture, admins can add more chunkservers, and the GFS manager can start utilizing them.
A single GFS cluster has the capability to store hundreds of terabytes of data. If a single cluster doesn't suffice for storing all the tenants data, multiple specialized instances of the GFS cluster can be created for multiple tenants or for multiple applications of the same tenant.
Note: GFS provides horizontal scalability by adding more chunkservers as needed. Though such scalability is not infinite, primarily due to a single manager in the system.
Availability
GFS stores three copies of each chunk on different chunkservers by default. If one fails, the other can serve. Moreover, the replica placement strategy of GFS also supports availability. The chunkservers are placed into different machine racks. We may spread all the replicas across different chunkservers placed in the same rack. This increases the system’s availability in the event of a chunkserver failure. Still, we risk losing all replicas if the entire rack is damaged or goes offline due to a shared resource, such as power or a network switch.
The GFS manager re-replicates the chunks if one or more of its replicas are lost permanently. This is how GFS makes the data available to the clients. The GFS manager prioritizes the re-replication in the case when two out of three chunk replicas are gone bad. Let's see how the manager makes the metadata available so that client operation could continue at all times.
The GFS manager holds all the metadata that might be needed or can change due to client operations. The single manager could be the single point of failure, and therefore unavailable to the clients in case of failure. GFS checkpoints the metadata state and logs the metadata change operations into an operation log that is placed on the manager's hard disk and remote replicated storage. If the manager undergoes a temporary failure, it restarts in seconds by loading the checkpoint (from the backup storage) and applying the log operations afterward. If just the manager process fails and the manager restarts on the same server, it can use the backup from the local disk. ...