Combiner and Partitioner

This lesson talks about the combiner and partitioner hooks for a MapReduce job.

Combinator and Partitioner

In this lesson, we implement two optional features of MapReduce we discussed earlier.

Combiner

We can specify a class that acts on the output of a map task for each key. One of the reason to implement a combiner is to aggregate the intermediate map output. Then, during the shuffle process, the number of bytes transferred over the wire is reduced. Transferring data over a network introduces significant latency, and so the less data put on wire, the better.

In our mapper class, we output a count of 1 every time we come across a car make. This will result in the output of several key-value pairs by a single map task with the same key/value pairs e.g. (Tesla, 1). We can sum up all the values for the key Tesla on the map side and output a single key/value pair for Tesla. Our combiner class looks as follows:

Get hands-on with 1300+ tech skills courses.