...

/

Design of Google Colossus (GFS-II)

Design of Google Colossus (GFS-II)

Learn how Google scaled its file system from petabytes to exabytes with Colossus, an extension of the Google File System (GFS).

New requirements

Colossus is a file system developed by Google intended to tackle the escalating data requirements that were beyond GFS’s capabilities.


Question: What do you think is the limiting component of GFS due to which Google designed Colossus?

Limitation of GFS


Scalability to exabytes

GFS was built to meet the requirement of storing a few million files with a total of hundreds of terabytes of data. For this scale, having a single manager with some workload optimizations was sufficient. It helped Google develop a file system for its users much more rapidly, which wouldn’t have been possible with a distributed manager-based design. Distributed models are complex, and it takes time to design such systems.

All of Google’s applications, including Google Cloud Storage services, use Google’s own file system. The growing number of Google’s applications and Cloud users has led to massive growth in data needs, requiring a file system that can scale to multiple petabytes (exabytes). The metadata storage requirements also grow with the volume of data. GFS can’t scale to exabytes due to a single manager.

Press + to interact
The Google Cloud services (the services that all run on the same file system infrastructure) scale when the underlying file system scales
The Google Cloud services (the services that all run on the same file system infrastructure) scale when the underlying file system scales

Due to the accelerating volume of data generated by an increasing number of applications, it was not possible to manage data at such a big scale by using GFS as the underlying file system for all applications.

Low latency

At the time GFS was being developed, most workloads had high throughput requirements rather than low latency. Therefore, Google focused on providing a high throughput file system, but that design is not good for latency-sensitive applications like video gaming, online meetings (including Google Meet), video serving, etc. These applications require a response in real time. The GFS metadata service can go down for up to a minute because of a single point of failure. This downtime is not a significant problem for batch-oriented applications that require high throughput and can bear a latency of a ...