MRv1 versus MRv2
MRv1 (MapReduce version 1) is part of Apache Hadoop 1.x and is an implementation of the MapReduce programming paradigm.
The MapReduce project itself can be broken into the following parts:
- End-user MapReduce API: This is the API needed to develop the MapReduce application.
- MapReduce framework: This is the runtime implementation of various phases, such as the map phase, the sort/shuffle/merge aggregation phase, and the reduce phase.
- MapReduce system: This is the backend infrastructure required to run MapReduce applications and includes things such as cluster resource management, scheduling of jobs, and so on.
Hadoop 1.x was written solely as an MR engine. Since it runs on a cluster, its cluster management component was also tightly coupled with the MR programming paradigm. The only thing that could be run on Hadoop 1.x was an MR job.
In MRv1, the cluster was managed by a single JobTracker and multiple TaskTrackers running on the DataNodes.
In Hadoop 2.x, the old MRv1 framework...