...

/

Disk Blocks & HDFS Blocks

Disk Blocks & HDFS Blocks

This lesson talks about the disk blocks, filesystems blocks and HDFS blocks.

Disk Blocks & HDFS Blocks

We discussed a disk block at the start of the chapter. It is the smallest unit of data that can be read from or written to a disk. Usually disk blocks are 512 bytes in size. The filesystem sitting on top of the physical disk works with an abstraction called the filesystem block, not with disk blocks. The filesystem block is often an integral multiple of disk blocks, usually a few kilobytes in size. However, this complexity is hidden from the end users of the filesystem.

HDFS is not a physical filesystem, but rather a virtual abstraction over distributed disk-based file systems. HDFS can’t be browsed like the local filesystem. You need the HDFS shell, the HDFS web UI, or programmatic APIs to do that. The words block and blocksize have a different meaning in HDFS context. Let’s explore them next.

HDFS block

A file in HDFS is logically divided up into HDFS blocks. Each HDFS block is physically made of filesystem blocks of the underlying filesystem, which in turn is an integral multiple of the disk block size.

The benefit of block abstraction for a distributed file system like HDFS is that a file can be larger than any single disk in the cluster. In the latest version of Hadoop, HDFS ...

Access this course and 1400+ top-rated courses and projects.