...

/

MapReduce's Manager-Worker Architecture

MapReduce's Manager-Worker Architecture

Let's study the architecture of MapReduce and the guarantees provided by it.

The manager node is responsible for scheduling tasks for worker nodes and managing their execution, as shown in the following illustration:

Press + to interact
Architecture of MapReduce
Architecture of MapReduce

Apart from the definition of the map and reduce functions, the user can also specify the number M of map tasks, the number R of reduce tasks. MapReduce can also specify the number of input or output files, and a partitioning function that defines how key-value pairs from the map tasks are partitioned before being processed by the reduce tasks. By default, a hash partitioner is used that selects a reduce task using the formula hash(key) mod R.

Steps for the execution of MapReduce

The execution of MapReduce proceeds in the following way:

  • The framework divides the input files into M pieces, called input splits, typically between 16 and 64 MB per split.

  • It then starts an instance of a manager node and multiple worker node instances on an existing ...