Regression model evaluation using Spark 2.0
In this recipe, we explore how to evaluate a regression model (a regression decision tree in this example). Spark provides the RegressionMetrics facility which has basic statistical facilities such as Mean Squared Error (MSE), R-Squared, and so on, right out of the box.
The objective in this recipe is to understand the evaluation metrics provided by Spark out of the box.
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter4
- Import the necessary packages for SparkContext to get access to the cluster:
import org.apache.spark.mllib.evaluation.RegressionMetrics import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.tree.DecisionTree import org.apache.spark.sql.SparkSession
- Create Spark's configuration and SparkContext...