What is the HDFS Block?

The Hadoop Distributed File System (HDFS) stores files in block-sized chunks called data blocks. These blocks are then stored as independent units and are restricted to 128 MB blocks by default. However, they can be adjusted by the user according to their requirements.

Users can adjust block size through the dfs.block.size in the hdfs-site.xml.

If the file size is not a multiple of 128 MB, the last block may be smaller.

Advantages

  1. No limitation on the file size as a file can be larger than any single disk in the network.

  2. Since blocks are of a fixed size, we can easily calculate the number of blocks stored on a given disk. This provides simplicity to the storage subsystem.

  3. Blocks are easy to replicate between DataNodes and, thus, provide fault tolerance and high availability.

  4. Since blocks don’t require storing file metadatasuch as the type of permissions with the blocks, another system can separately handle the metadata.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved