MapReduce in Batch Processing
Learn about the MapReduce programming model.
In this lesson, we will learn a popular algorithm that is used frequently to do batch processing on a huge volume of data. Google published this algorithm in 2004, and it was later adopted in many data processing systems, such as Apache Spark.
The MapReduce algorithm
We’ll first look at this algorithm with an example. First, let’s imagine the following scenario:
- You have all the text of a piece of classic English literature.
- You want to count the occurrence of each word in the whole text.
- The data is stored in some persistent storage.
- The data is so huge that it cannot be loaded in memory in one physical machine. This means you have to use