Hadoop streaming
We have mentioned previously that MapReduce programs don't have to be written in Java. There are several reasons why you might want or need to write your map and reduce tasks in another language. Perhaps you have existing code to leverage or need to use third-party binaries—the reasons are varied and valid.
Hadoop provides a number of mechanisms to aid non-Java development, primary amongst which are Hadoop pipes that provide a native C++ interface and Hadoop streaming that allows any program that uses standard input and output to be used for map and reduce tasks. With the MapReduce Java API, both map and reduce tasks provide implementations for methods that contain the task functionality. These methods receive the input to the task as method arguments and then output results via the Context
object. This is a clear and type-safe interface, but it is by definition Java-specific.
Hadoop streaming takes a different approach. With streaming, you write a map task that reads its...