Introduction
This chapter, along with the next chapter, covers the fundamental techniques for regression and classification available in Spark 2.0 ML and MLlib library. Spark 2.0 highlights a new direction by moving the RDD-based regressions (see the next chapter) to maintenance mode while emphasizing Linear Regression and Generalized Regression going forward.
At a high level, the new API design parameterization of elastic net to the ridge versus Lasso regression and everything in between, as opposed to a named API (for example, LassoWithSGD
). The new API approach is a much cleaner design and forces you to learn elastic net and its power when it comes to feature engineering that remains an art in data science. We provide adequate examples, variations, and notes to guide you through the complexities in these techniques.
The following figure depicts the regression and classification coverage (part 1) in this chapter:
First, you will learn how to implement linear regression using algebraic equations...