Multiclass classification model evaluation using Spark 2.0
In this recipe, we explore MulticlassMetrics
, which allows you to evaluate a model that classifies the output to more than two labels (for example, red, blue, green, purple, do-not-know). It highlights the use of a confusion matrix (confusionMatrix
) and model accuracy.
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter4
- Import the necessary packages for SparkContext to get access to the cluster:
import org.apache.spark.sql.SparkSession import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS import org.apache.spark.mllib.evaluation.MulticlassMetrics import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.util.MLUtils
- Create Spark's configuration and SparkContext:
val spark = SparkSession .builder .master("local[*]") .appName...