Binary classification model evaluation using Spark 2.0
In this recipe, we demonstrate the use of the BinaryClassificationMetrics
facility in Spark 2.0 and its application to evaluating a model that has a binary outcome (for example, a logistic regression).
The purpose here is not to showcase the regression itself, but to demonstrate how to go about evaluating it using common metrics such as receiver operating characteristic (ROC), Area Under ROC Curve, thresholds, and so on.
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter4
- Import the necessary packages for SparkContext to get access to the cluster:
import org.apache.spark.sql.SparkSession import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics import org.apache.spark.mllib.regression.LabeledPoint...