Introduction to MapReduce
Let's find the inspiration behind MapReduce. Also, look into the goal and the working of MapReduce.
We'll cover the following
MapReduce is a framework for batch data processing originally developed internally in Google by Dean et al… It was later incorporated in the wider Apache Hadoop framework.
Inspiration
The framework draws inspiration from the field of functional programming and is based on the following main idea:
Idea
Many real-word computations can be expressed with the use of two main primitive functions, map and reduce.
Map
The map
function processes a set of key-value pairs and produces another set of intermediate key-value pairs as output.
Reduce
The reduce
function receives all the values for each key and returns a single value, essentially merging all the values according to some logic
An important property of map and reduce functions
The map
and reduce
functions can easily be parallelized and run across multiple machines for different parts of the dataset. As a result,
- The application code is responsible for defining these two methods.
- The framework is responsible for partitioning the data, scheduling the program’s execution across multiple nodes, handling node failures, and managing the required inter-machine communication.
Get hands-on with 1400+ tech skills courses.