Conducting predictive analytics using Spark MLib
Spark has a very rich machine learning library called MLib
. This is a collection of various algorithms that are used for classification, clustering, recommendations, and so on. In this recipe, we are going to take a look at how to build a predictive model using MLib
.
Getting ready
To perform this recipe, you should have Hadoop and Spark installed. You also need to install Scala. Here, I am using Scala 2.11.0.
How to do it...
For this recipe, we are going use the classic example dataset of iris flowers; you can find out more about this at https://en.wikipedia.org/wiki/Iris_flower_data_set.
Here, based on the petal length and width and the sepal length and width, we need to classify the flowers into species. First, we build a model, and then run tests on it to predict the output.
To start with, we first download iris.txt
from https://github.com/deshpandetanmay/hadoop-real-world-cookbook/blob/master/data/iris.txt.
Next, save it in HDFS.
We start the...