...

/

A Pragmatic Introduction to Hadoop and MapReduce

A Pragmatic Introduction to Hadoop and MapReduce

Learn about the MapReduce paradigm and Hadoop.

Distributed file systems

Traditional techniques might not suffice to extract meaningful information from big data datasets.

Google realized this problem and a potentially robust solution at an early stage.

That’s why they created a distributed file system, called the Google File System (GFS), and published a paper regarding it.

In a nutshell, GFS is a scalable distributed file system for distributed data-intensive applications. Their ultimate goal is to process massive datasets, index billions of web pages, and extract knowledge from them efficiently.

Another important requirement was that GFS should run on a cluster of commodity servers, servers that use everyday hardware like that of most standard computers.

That’s because specialized hardware is expensive and produced on-demand. Due to the size of clusters ...