The WordCount example
In this section, we present the MapReduce solution to the WordCount problem, sometimes called the Hello World example for MapReduce.
The diagram in Figure 11-2 shows the data flow for the WordCount program. On the left are two of the 80 files that are read into the program:
During the mapping stage, each word, followed by the number 1, is copied into a temporary file, one pair per line. Notice that many words are duplicated many times. For example, image
appears five times among the 80 files (including both files shown), so the string image 1
will appear four times in the temporary file. Each of the input files has about 110 words, so over 8,000 word-number pairs will be written to the temporary file.
Note that this figure shows only a very small part of the data involved. The output from the mapping stage includes every word that is input, as many times that it appears. And the output from the grouping stage includes every...