Lasso regression with SGD optimization in Spark 2.0
In this recipe, we will use the housing dataset from the recipes to demonstrate shrinkage with Spark's RDD-based lasso regression LassoWithSGD()
, which can select a subset of parameters by setting the other weights to zero (hence eliminating some parameters based on the threshold) while reducing the effect of others (regularization). We emphasize again that ridge regression reduces the parameter weight, but never sets it to zero.
LassoWithSGD()
, which is Spark's RDD-based lasso (Least Absolute Shrinkage and Selection Operator) API, a regression method that performs both variable and regularization at the same time in order to eliminate non-contributing explanatory variables (that is, features), therefore enhancing the prediction's accuracy. Lasso, which is based on Ordinary Least Squares (OLS), can be easily to other methods, such as General Liner Methods (GLM).
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice...