Time for action – WordCount, the Hello World of MapReduce
Many applications, over time, acquire a canonical example that no beginner's guide should be without. For Hadoop, this is WordCount – an example bundled with Hadoop that counts the frequency of words in an input text file.
First execute the following commands:
$ hadoop dfs -mkdir data $ hadoop dfs -cp test.txt data $ hadoop dfs -ls data Found 1 items -rw-r--r-- 1 hadoop supergroup 16 2012-10-26 23:20 /user/hadoop/data/test.txt
Now execute these commands:
$ Hadoop Hadoop/hadoop-examples-1.0.4.jar wordcount data out 12/10/26 23:22:49 INFO input.FileInputFormat: Total input paths to process : 1 12/10/26 23:22:50 INFO mapred.JobClient: Running job: job_201210262315_0002 12/10/26 23:22:51 INFO mapred.JobClient: map 0% reduce 0% 12/10/26 23:23:03 INFO mapred.JobClient: map 100% reduce 0% 12/10/26 23:23:15 INFO mapred.JobClient: map 100% reduce 100% 12/10/26 23:23:17 INFO mapred.JobClient: Job complete: job_201210262315_0002...