The Mahout and Spark integration
Apache Mahout was a general machine learning library built on top of Hadoop. Mahout started out primarily as a Java MapReduce package to run machine learning algorithms. As machine learning algorithms are iterative in nature, MapReduce had major performance and scalability issues, so Mahout stopped the development of MapReduce-based algorithms and started supporting new platforms, such as Spark, H2O, and Flink, with a new package called Samsara.
Let's install Mahout, explore the Mahout shell with Scala bindings, and then build a recommendation system.
Installing Mahout
The latest version of Spark does not work well with Mahout yet, so I used the Spark 1.4.1 version with the Mahout 0.12.2 version. Download the Spark prebuilt binary from the following location and start Spark daemons:
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.1-bin-hadoop2.6.tgz tar xzvf spark-1.4.1-bin-hadoop2.6.tgz cd spark-1.4.1-bin-hadoop2.6
Now, let's download the Mahout binaries...