Streaming linear regression for a real-time regression
In this recipe, we will use the wine quality dataset from UCI and Spark's streaming linear regression algorithm from MLlib to predict the quality of a wine based on a group of wine features.
The difference between this recipe and the traditional regression recipes we saw before is the use of Spark ML streaming to score the quality of the wine in real time using a linear regression model.
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter13
- Import the necessary packages:
import org.apache.log4j.{Level, Logger} import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD import org.apache.spark.rdd.RDD import org.apache.spark.sql.{Row, SparkSession} import org.apache...