Apache Mahout with Hadoop
Apache Mahout uses Apache Hadoop, which is a distributed computing framework, to achieve scalability. The following figure clearly shows the place where Apache Hadoop fits into Apache Mahout:
As shown in the previous figure, Yarn (Data processing) and HDFS (Data Storage) are key components in Apache Hadoop.
In this chapter, we will explain the important subcomponents of Yet Another Resource Negotiator (YARN) and HDFS and their behavior in detail before proceeding to the Hadoop installation steps.
YARN with MapReduce 2.0
First, let's understand YARN, which is a new addition to Apache Hadoop 2.0.
Earlier, Apache Hadoop operated with MapReduce 1.0. It had some drawbacks in cluster resource utilization due to the constraints incurred with the static allocation of map and reduce slots.
YARN, along with MapReduce 2.0, has overcome this drawback by inventing a novel, flexible resource allocation model that contains containers.
The YARN architecture consists of the following...