Linear regression API with Lasso and 'auto' optimization selection in Spark 2.0
In this recipe, we build on the recipe LinearRegression
by selecting LASSO regression explicitly via the setElasticNetParam(0.0)
while letting Spark 2.0 pick the optimization on its own using setSolver('auto')
. We remind again that the RDD-based regression API is now in maintenance mode and this is the preferred method going forward.
How to do it...
- We use a housing data set from the UCI machine library depository..
- Download the entire data set from the following URLs:
The dataset is comprised of 14 columns with the first 13 columns being the independent variables (that is, features) that try to explain the median price (that is, last column) of an owner-occupied house in Boston, USA.
We have chosen and cleaned the first eight columns as features. We use the first 200 rows to train and predict the median...