Generalized linear regression in Spark 2.0
This recipe covers the generalized regression model (GLM) implementation in Spark 2.0. There is a great between this GeneralizedLinearRegression
in Spark 2.0 and glmnet
implementation in R. This API is a welcome addition that allows you to select and set both distribution family (for example, Gaussian) and link functions (for example, inverse log) with a and well-designed API.
How to do it...
- We use a housing dataset from the UCI machine library depository.
- Download the entire dataset from the following URLs:
The dataset is comprised of 14 columns with the first 13 columns being the independent variables (that is, features) that try to explain the median price (that is, last column) of an owner-occupied house in Boston, USA.
We have chosen and cleaned the first eight columns as features. We use the first 200 rows to train and predict the...