Scalability
The great benefit of the MapReduce framework is that it is scalable. The WordCount program in Example1.java
was run on 80 files containing fewer than 10,000 words. With little modification, it could be run on 80,000 files with 10,000,000 words. That flexibility in software is called scalability.
To manage that thousand-fold increase in input, the hash table might have to be replaced. Even if we had enough memory to load a table that large, the Java processing would probably fail because of the proliferation of objects. Object-oriented programming is certainly the best way to implement an algorithm. But if you want clarity, speed, and flexibility it is not so efficient at handling large datasets.
We don't really need the hash table, which is instantiated at line 24 in Listing 11-1. We can implement the same idea by hashing the data into a set of files instead. This is illustrated in Figure 11-3.
Replacing the hash table with file chunks would require modifying the code at lines 34...