Stream Processing
Understand what stream processing is and where it is used.
In the word counting example for the MapReduce algorithm, we noticed that the entire text is stored somewhere, and the mapper machines loaded chunks of the data and processed the chunks. The data is more or less organized in batches. The processing system loads the batches and does the job. Eventually, the system produces some form of output data.
Now, the input is somewhat bounded. We assumed that we had all the text of the English literature. All we had to do is to run the MapReduce algorithm on top of the data and gather results.
Now the important question.
What if the data is unbounded?
How to handle unbounded data
In a real life system where we need data processing, data is almost always unbounded. Let’s quickly discuss an example.
Assume the engineers at Instagram decided to analyze ...