Combiner and Partitioner

This lesson talks about the combiner and partitioner hooks for a MapReduce job.

We'll cover the following...

Combinator and Partitioner
Combiner
Partitioner

Combinator and Partitioner

In this lesson, we implement two optional features of MapReduce we discussed earlier.

Combiner

We can specify a class that acts on the output of a map task for each key. One of the reason to implement a combiner is to aggregate the intermediate map output. Then, during the shuffle process, the number of bytes transferred over the wire is reduced. Transferring data over a network introduces significant latency, and so the less data put on wire, the better.

In our mapper class, we output a count of 1 every time we come across a car make. ...

1.Hadoop

2.YARN

3.Map Reduce

4.HDFS

5.Spark

6.Input & Output Formats

7.Misc

8.Quiz

9.Reference: Replication

10.Reference: Partitioning

11.Reference: Transactions

12.Reference: Issues in Distributed Systems

Mock Interview

Combiner and Partitioner

Combinator and Partitioner

Combiner