Design Refinements in MapReduce: Part II
Let's analyze more insights, debugging, and efficient error-handling refinements in the MapReduce design.
We can incorporate the following refinements to get insights into our system’s status and performance, along with error handling mechanisms and debugging facilities. All of these refinements are supplementary to the previously covered refinements and augment the overall efficiency of the design.
Status information
Even with all the distribution and parallelization, the MapReduce
job is a time-taking process. For example, the best Hadoop (an open source implementation of Google’s MapReduce
library) performance to date
It’s beneficial for the users to access the status of their MapReduce
jobs to get insights and make crucial decisions in case any modifications are required.
Status pages
The manager houses an internal HTTP server and provides users access to a set of status pages. These status pages present the computation progress, such as the number of completed tasks, the number of in-progress tasks, input data size, intermediate data size, output data size, processing rates, etc.
These pages also contain information about the number of failed tasks, the workers they were running on, and which Map
or Reduce
tasks they were processing, along with links to the standard errors.
These status pages also provide users with links to the standard output files generated by each task.
Level up your interview prep. Join Educative to access 80+ hands-on prep courses.