Testing MapReduce Program
In this lesson, we'll demonstrate how to test a MapReduce program on a local machine.
We'll cover the following
Testing MapReduce
So far, we have learned how to write mapper and reducer classes and their corresponding unit tests. But ideally, we want to test our MapReduce job end to end. There are different ways to run a MapReduce job:
-
Using the
ToolRunner
class to run a MapReduce job on a local machine. The MapReduce job must implement the interfaceTool
. This doesn’t require any running Hadoop daemons. -
Setup a Hadoop cluster on a local machine in pseudo distributed mode and then submit the job to the cluster.
-
Submit the MapReduce job to an actual cluster consisting of many machines.
We’ll start with the first option by writing a program that implements the Tool
interface. However, any class implementing Tool
interface must also implement the interface Configurable
, which is in turn extended by Tool
. The easiest way for a MapReduce job is to derive from one of the Hadoop’s helper class Configured
which already implements the interface Configurable
. If we name our MapReduce job the class signature would look as follows:
public class CarCounterMrProgram extends Configured implements Tool {
// ... class body
}
The input to the program will live on the local disk. Create an object of type Configuration
. Specify this information, along with setting mapreduce.framework.name
, to local. These changes are shown on lines 27 and 28 in the class CarCounterMrProgram
below in the code widget. The class CarCounterMrProgram
also represents our MapReduce job. We carry over the mapper and reducer classes created in the previous sections without any changes. We also create a class CarMRInputGenerator
to generate random data. Read the comments in the code widget below and examine the various classes. Unfortunately, the code is not runnable because of hostname resolution challenges in a VM, required for testing the MapReduce program.
Get hands-on with 1400+ tech skills courses.