Linear regression API with Lasso and L-BFGS in Spark 2.0
In this recipe, we will demonstrate the use of Spark 2.0's LinearRegression()
API to showcase a unified/parameterized API to tackle the linear in a comprehensive capable of extension without backward-compatibility issues of an RDD-based named API. We show how to use the setSolver()
to set the optimization method to first-order memory-efficient L-BFGS, which can deal with numerous amount of parameters (that is, especially in sparse configuration) with ease.
Note
In this recipe, the .setSolver()
is set to lbgfs
, which makes the L-BFGS (see RDD-based regression for more detail) the selected optimization method. The .setElasticNetParam()
is not set, so the default of 0
remains in effect, which makes this a Lasso regression.
How to do it...
- We use a housing dataset from the UCI machine library depository.
- Download the entire data set from the following URLs: