Hadoop 2.x
Until Hadoop 2.x, all the distributions were focused on addressing the limitations in Hadoop 1.x but did not deviate from the core architecture. Hadoop 2.x really changed the underlying architecture assumptions and turned out to be a real breakthrough; most importantly, the introduction of YARN. YARN was a new framework for managing Hadoop cluster, which introduced the ability to handle real-time processing needs in addition to the batch. Some important issues that were addressed are listed as follows:
- Single NameNode issues
- Dramatic increase in the number of nodes in the cluster
- Extension to the number of tasks that can be successfully addressed with Hadoop
The following figure depicts the difference between the Hadoop 1.x and 2.x architectures and how YARN wires MapReduce and HDFS:
Hadoop ecosystem components
Hadoop has spawned a bunch of auxiliary and supporting frameworks. The following figure depicts the gamut of supporting frameworks contributed by the open source developer groups...