Walking through a run of a MapReduce job
To explore the relationship between mapper and reducer in more detail, and to expose some of Hadoop's inner workings, we'll now go through how a MapReduce job is executed. This applies to both MapReduce in Hadoop 1 and Hadoop 2 even though the latter is implemented very differently using YARN, which we'll discuss later in this chapter. Additional information on the services described in this section, as well as suggestions for troubleshooting MapReduce applications, can be found in Chapter 10, Running a Hadoop Cluster.
Startup
The driver is the only piece of code that runs on our local machine, and the call to Job.waitForCompletion()
starts the communication with the JobTracker, which is the master node in the MapReduce system. The JobTracker is responsible for all aspects of job scheduling and execution, so it becomes our primary interface when performing any task related to job management.
To share resources on the cluster the JobTracker...