Using languages other than Java with Hadoop
We have mentioned previously that MapReduce programs don't have to be written in Java. Most programs are written in Java, but there are several reasons why you may want or need to write your map and reduce tasks in another language. Perhaps you have existing code to leverage or need to use third-party binaries—the reasons are varied and valid.
Hadoop provides a number of mechanisms to aid non-Java development, primary amongst these are Hadoop Pipes that provides a native C++ interface to Hadoop and Hadoop Streaming that allows any program that uses standard input and output to be used for map and reduce tasks. We will use Hadoop Streaming heavily in this chapter.
How Hadoop Streaming works
With the MapReduce Java API, both map and reduce tasks provide implementations for methods that contain the task functionality. These methods receive the input to the task as method arguments and then output results via the Context
object. This is a clear and...