...

Putting it Together

This lesson explains end to end working of MapReduce.

We'll cover the following...

In the driver program of our example, you’ll see the job is submitted using the method waitForCompletion().
```
  job.waitForCompletion(true);
```
This method returns when the job has successfully completed. A lot goes on behind the scenes before this method returns. We’ll trace the various steps involved in the execution of a job when submitted to a Hadoop cluster.
The class JobSubmitter is responsible for talking to the resource manager and retrieving a new application ID used as the ID of the MR job. The class also performs sanity checks, such as verifying if the output path exists and the input splits can be successfully computed.
Next, the resources for running the job are copied over to HDFS in a staging directory, with the ID of the job in the path. Resources include the jar file, which holds the mapper and reducer code to execute. This file is renamed as job.jar. Configuration files and metadata about input splits is also copied. After the job successfully finishes, the framework deletes this staging directory. You can set the property mapreduce.task.files.preserve.filepattern to choose what files to keep for debug purposes.

The jar file is replicated across the cluster to be readily available for node managers to access in ...

Putting it Together