Time for action – WordCount with a combiner
Let's add a combiner to our first WordCount example. In fact, let's use our reducer as the combiner. Since the combiner must have the same interface as the reducer, this is something you'll often see, though note that the type of processing involved in the reducer will determine if it is a true candidate for a combiner; we'll discuss this later. Since we are looking to count word occurrences, we can do a partial count on the map node and pass these subtotals to the reducer.
Copy
WordCount1.java
toWordCount2.java
and change the driver class to add the following line between the definition of theMapper
andReducer
classes:job.setCombinerClass(WordCountReducer.class);
Also change the class name to
WordCount2
and then compile it.$ javac WordCount2.java
Create the JAR file.
$ jar cvf wc2.jar WordCount2*class
Run the job on Hadoop.
$ hadoop jar wc2.jar WordCount2 test.txt output
Examine the output.
$ hadoop fs -cat output/part-r-00000