Hadoop MapReduce internals
The MapReduce programing model can be used to process many large-scale data problems using one or more steps. Also, it can be efficiently implemented to support problems that deal with large amount of data using a large number of machines. In a Big Data context, the size of data processed may be so large that the data cannot be stored on a single machine.
In a typical Hadoop MapReduce framework, data is divided into blocks and distributed across many nodes in a cluster and the MapReduce framework takes advantage of data locality by shipping computation to data rather than moving data to where it is processed. Most input data blocks for MapReduce applications are located on the local node, so they can be loaded very fast, and reading multiple blocks can be done on multiple nodes in parallel. Therefore, MapReduce can achieve very high aggregate I/O bandwidth and data processing rate.
To launch a MapReduce job, Hadoop creates an instance of the MapReduce application and submits the job to the JobTracker. Then, the job is divided into map tasks (also called mappers) and reduce tasks (also called reducers).
When Hadoop launches a MapReduce job, it splits the input dataset into even-sized data blocks and uses a heartbeat protocol to assign a task. Each data block is then scheduled to one TaskTracker node and is processed by a map task.
Each task is executed on an available slot in a worker node, which is configured with a fixed number of map slots, and another fixed number of reduce slots. If all available slots are occupied, pending tasks must wait until some slots are freed up.
The TaskTracker node periodically sends its state to the JobTracker. When the TaskTracker node is idle, the JobTracker node assigns new tasks to it. The JobTracker node takes data locality into account when it disseminates data blocks. It always tries to assign a local data block to a TaskTracker node. If the attempt fails, the JobTracker node will assign a rack-local or random data block to the TaskTracker node instead.
When all map functions complete execution, the runtime system groups all intermediate pairs and launches a set of reduce tasks to produce the final results. It moves execution from the shuffle phase into the reduce phase. In this final reduce phase, the reduce
function is called to process the intermediate data and write the final output.
Users often use terms with different granularities to specify Hadoop map and reduce tasks, subtasks, phases, and subphases. While the map task consists of two subtasks: map and merge, the reduce task consists of just one task. However, the shuffle and sort happen first and are done by the system. Each subtask in turn gets divided into many subphases such as read-map, spill, merge, copy-map, and reduce-write.