Introduction
In this chapter, the second half of regression and classification in Spark 2.0, we highlight RDD-based regression, which is currently in practice in a lot of existing Spark ML implementations. Any intermediate to advanced practitioner is expected to be able to work with these techniques due to the existing code base.
In this chapter, you will learn how to implement a small application using various regressions (linear, logistic, ridge, and lasso) with Stochastic Gradient Descent (SGD) and L-BFGS with linear yet powerful classifiers such as Support Vector Machines (SVM) and Naive Bayes classifiers using the Apache Spark API. We augment each recipe with sample fit measurement when appropriate (for example, MSE, RMSE, ROC, and binary and multi-class metrics) to demonstrate the and completeness of Spark MLlib. We introduce RDD-based linear, logistic, ridge, and lasso regression, and then discuss SVM and Naïve Bayes to demonstrate more sophisticated classifiers.
The following diagram...