Real-world systems are rarely designed in one go—it often takes many iterations to improve the design. As initial versions of our system are deployed in production, we get usage data and possibly new insights. In this and the next lesson, we will improve many aspects of MapReduce design.

Ordering our refinements goes along with the execution flow of the system.

Input and output types

Let’s analyze the supported input and output types by the MapReduce library.

Input types

By default, the MapReduce library supports reading a limited set of various input data types. Each input type implementation automatically handles the data splitting into meaningful ranges for further processing by the Map tasks.

Example

As we know, the data gets partitioned into key-value pairs before it is processed by the Map tasks. The “text” mode input processes each line as a key-value pair, such that:

  • The key is an offset in the input file.
  • The value is the content of that line.

This mode ensures that the partitioning happens only at the line boundaries.

Level up your interview prep. Join Educative to access 80+ hands-on prep courses.