Introduction to GFS

Understand the requirements that led to the development of GFS and learn about its architecture.

Problems with traditional file systems

Before Google File System (GFS), there were single-node file systems, network-attached storage (NAS) systems, and storage area networks (SAN). These file systems served well at the time they were designed. However, with the growing demands of data storage and processing needs of large distributed data-intensive applications, these systems had limitations, some of which have been discussed below.

A single-node file system is a system that runs on a single computer and manages the storage attached to it. A single server has limited resources like storage space, processing power, as well as I/O operations that can be performed on a storage disk per second. We can attach substantial storage capacity to a single server, increase the RAM, and upgrade the processor, but there are limits to this type of vertical scaling. A single server also has limitations regarding the number of data reads and writes, and how quickly data is stored and accessed. These limitations restrict the system's ability to process large datasets and serve a large number of clients simultaneously. We also, can't ignore the fact that a single-node system is a single point of failure where the system becomes unavailable to the users. The focus should be on high throughputThroughput measures how much data is being sent through your network at any given moment. It is frequently measured in bits per second. [source: https://networkshardware.com/throughput-vs-latency/] rather than low latencyLatency refers to the amount of time it takes for your data to be sent to its intended destination. It’s also referred to as the delay in moving data between two clients. It is usually measured in milliseconds (ms). The lower it is, the better. [source: https://networkshardware.com/throughput-vs-latency/] for applications requiring large datasets processing.

The network-attached storage (NAS) system consists of a file-level storage server with multiple clients connected to it over a network running the network file system (NFS) protocol. Clients can store and access the files on this remote storage server like the local files. The NAS system has the same limitations as a single-node file system. Setting up and managing a NAS system is easy but expensive to scale. This system can also suffer from throughput issues while accessing large files from a single server.

Level up your interview prep. Join Educative to access 80+ hands-on prep courses.