Mapper

This lesson describes the Mapper function used to implement the Map phase.

Mapper

We’ll start by examining Hadoop’s Java classes to see how the abstract concept of map and reduce is translated in code. The Map phase of a MapReduce job is implemented by a Java class called Mapper. It maps input key/value pairs to a set of intermediate key/value pairs. Conceptually, a mapper performs parsing, projection (selecting fields of interest from the input) and filtering (removing non-interesting or malformed records). The Mapper class is defined as follows in the package org.apache.hadoop.mapreduce;

Mapper class

public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {
        //...class body
    }

The Mapper class provided by Hadoop can be used to write our custom derived mapper class. Both the phases in a MapReduce have key/value pairs as input and output. If you look at the Mapper class you can see four generic parameters representing the inputs and outputs to the mapper class. The Mapper class defines a map(...) ...

Access this course and 1400+ top-rated courses and projects.