Introduction to Colossus

Understand the importance of Colossus and its high-level design.

Colossus is the descendant of the Google File system (GFS). Let's examine why Google needed to develop Colossus (a new file system) while GFS was already there.


Question: What do you think is the limiting component of GFS due to which Google designed Colossus?

Limitation of GFS

Disclaimer: Google hasn’t revealed the full design of the Colossus system, so there are many open questions. For some of them, we have speculatively given answers, while for others, we encourage learners to consider them on their own.

Need for Colossus

GFS was built to meet the requirement of storing a few million files with a total of hundreds of terabytes of data. For this scale, it was sufficient to have a single manager with some workload optimizations. It helped Google develop a file system for its users much more rapidly, which wouldn’t have been possible with a distributed manager-based design. Distributed models are complex and it takes time to design such systems.

All of Google's applications, including Google Cloud Storage services, use Google's own file system. The growing number of Google's applications and Cloud users has led to massive growth in data needs, requiring a file system that can scale to multiple petabytes (exabytes). GFS can't scale to exabytes due to a single manager. A single manager is responsible for managing metadata, storing metadata in memory and on the manager node's local disk, scanning through metadata for garbage ...