The Hadoop Distributed File System (HDFS) stores files in block-sized chunks called data blocks. These blocks are then stored as independent units and are restricted to 128 MB
blocks by default. However, they can be adjusted by the user according to their requirements.
Users can adjust block size through the
dfs.block.size
in thehdfs-site.xml
.
If the file size is not a multiple of 128 MB, the last block may be smaller.
No limitation on the file size as a file can be larger than any single disk in the network.
Since blocks are of a fixed size, we can easily calculate the number of blocks stored on a given disk. This provides simplicity to the storage subsystem.
Blocks are easy to replicate between DataNodes and, thus, provide fault tolerance and high availability.
Since blocks don’t require storing file
Free Resources