Time for action – implementing WordCount using Streaming
Let's flog the dead horse of WordCount one more time and implement it using Streaming by performing the following steps:
Save the following file to
wcmapper.rb
:#/bin/env ruby while line = gets words = line.split("\t") words.each{ |word| puts word.strip+"\t1"}} end
Make the file executable by executing the following command:
$ chmod +x wcmapper.rb
Save the following file to
wcreducer.rb
:#!/usr/bin/env ruby current = nil count = 0 while line = gets word, counter = line.split("\t") if word == current count = count+1 else puts current+"\t"+count.to_s if current current = word count = 1 end end puts current+"\t"+count.to_s
Make the file executable by executing the following command:
$ chmod +x wcreducer.rb
Execute the scripts as a Streaming job using the datafile from the previous chapter:
$ hadoop jar hadoop/contrib/streaming/hadoop-streaming-1.0.3.jar -file wcmapper.rb -mapper...