Sequence File: Intro

Apart from supporting text formats, Hadoop also supports binary formats. The sequence file is one of them. Binary data takes up less disk space than textual data. The temporary data output by map tasks is stored as sequence files.

One of the problems Hadoop faces is storing lots of small files. The Namenode runs short on memory if the system has too many small files. Similarly, if the input to a map-reduce job consists of numerous small files, then the number of mapper tasks (one per file) will be significantly more than if there were fewer, larger files. To overcome these issues, ...

Hadoop

YARN

Map Reduce

HDFS

Spark

Input & Output Formats

Misc

Quiz

Reference: Replication

Reference: Partitioning

Reference: Transactions

Reference: Issues in Distributed Systems

Sequence File: Intro

Sequence File: Intro