According to the Spark MLlib guide (see https://spark.apache.org/docs/latest/ml-guide.html), starting from Spark 2.0, the RDD-based APIs in the spark.mllib package will be retired. Users should use the DataFrame-based ML API in the spark.ml package. In this project, we import several classes from this new library to build the anomaly detection model. The following code block shows a few lines from the AnomalyDetection class in the com.example.esanalytics.spark.mllib package:
import org.apache.spark.ml.clustering.KMeansModel;
import org.apache.spark.ml.feature.VectorAssembler;
import org.apache.spark.ml.clustering.KMeans;
The following diagram helps us to learn about the steps to build the model within the scope of ES-Hadoop, Spark SQL, and Spark MLlib:
The step-by-step instructions are as follows:
- There are two major...