A short introduction to Hadoop 1.x and MRv1
We will briefly look at the basic Apache Hadoop 1.x and its processing framework, MRv1 (Classic), so that we can get a clear picture of the differences in Apache Hadoop 2.x MRv2 (YARN) in terms of architecture, components, and processing framework.
Apache Hadoop is a scalable, fault-tolerant distributed system for data storage and processing. The core programming model in Hadoop is MapReduce.
Since 2004, Hadoop has emerged as the de facto standard to store, process, and analyze hundreds of terabytes and even petabytes of data.
The major components in Hadoop 1.x are as follows:
- NameNode: This keeps the metadata in the main memory.
- DataNode: This is where the data resides in the form of blocks.
- JobTracker: This assigns/reassigns MapReduce tasks to TaskTrackers in the cluster and tracks the status of each TaskTracker.
- TaskTracker: This executes the task assigned by the JobTracker and sends the status of the task to the JobTracker.
The major components of...