Design and Evaluation of Colossus

Learn to achieve scalability and low latency with the Colossus design.

Colossus control plane and scalability

We’ve seen the architecture of Colossus and its components in the previous lesson. The most significant and interesting part of Colossus is the control plane that manages the underlying storage/file servers and replaces the GFS single manager. A GFS cluster consists of a single manager and multiple chunkservers, with a possibility of multiple GFS cluster instances per data center. If an application's data exceeds the limits that a single GFS cluster can handle, it uses multiple GFS cluster instances. If an application is storing its data on multiple clusters, it means there is some partitioning logic that should be implemented to tell which data should be stored in cluster 1, which data should go to cluster 2, and so on.

With GFS clusters, applications themselves are responsible for partitioning data across different GFS cluster instances. The need for multiple GFS clusters arises because a single cluster is not scalable enough to meet the growing data demands of the application. As we’ve discussed in the previous lesson, the reason for these scalability issues was the single manager. Colossus comes with a control plane that replaces the single manager in GFS, and makes the Colossus cluster scalable to exabytes of data and hundreds of thousands of machines.

Press + to interact
A typical Colossus cluster
A typical Colossus cluster

The control plane is the foundation of the Colossus file system. As shown in the illustration above, there are many clients that are sharing the same storage pool (D file servers): the Youtube servers storing and retrieving their data, Google Cloud storage attached to Google Compute Engine VMs, and Ads MapReduce nodes. All of these clients are able to share the same storage pool with the help of the Colossus control plane that is managing the underlying storage pool. The control plane is providing an illusion to the clients that they have isolated file ...