Reducer

This lesson introduces the implementation of the Reduce phase of a MapReduce job.

We'll cover the following...

Reducer
Code explanation
Working of the Reduce Task

Reducer

Let’s look at the reducer phase for creating a MapReduce job. The reduce tasks work on the intermediate input produced by the map tasks. The reduce tasks are completely independent of each other just like the mapper tasks; they do not communicate. However, the reducer tasks require intermediate key/value pairs produced by the mapper tasks as input. This communication, is facilitated by the Hadoop framework, and doesn’t require user intervention.

Reducer refers to a node that runs the reducer task. Each reducer processes data in its assigned partition. The map tasks partition their output so that one partition can be assigned to one reduce task.

Note that all records for a given key reside in a single partition, allowing a single reduce task to process all data for a given ...

Hadoop

YARN

Map Reduce

HDFS

Spark

Input & Output Formats

Misc

Quiz

Reference: Replication

Reference: Partitioning

Reference: Transactions

Reference: Issues in Distributed Systems

Reducer

Reducer