In this recipe, we will be using the Pima Diabetes dataset we downloaded in the previous recipe and Spark's streaming logistic regression algorithm with SGD to predict whether a Pima with various features will test positive as a diabetic. It is an on-line classifier that learns and predicts based on the streamed data.
Streaming logistic regression for an on-line classifier
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter13
- Import the necessary packages:
import org.apache.log4j.{Level, Logger}
import org.apache.spark.mllib.classification.StreamingLogisticRegressionWithSGD...