The Big Picture
This lesson gives the reader new perspective on HDFS.
We'll cover the following...
The Big Picture
In this lesson, we’ll discuss the architecture of HDFS, its goals, and its limitations. The Hadoop Distributed File System (HDFS) was designed with the following goals in mind:
-
Large files: The system should store large files comprising of several hundred gigabytes or petabytes.
-
Streaming data access: HDFS is optimized and built for a write-once and read-many-times pattern. Having the time to read the entire dataset is more important than the latency in reading the first record. HDFS doesn’t support multiple writers. Existing files on the system can only be appended to at the very end. Modifying a file at an arbitrary offset is not possible.
-
Commodity hardware: Hadoop is designed to run on clusters of cheap commodity hardware. It does not require expensive specialized hardware. The ...