This recipe covers the generalized regression model (GLM) implementation in Spark 2.0. There is a great parallel between this GeneralizedLinearRegression in Spark 2.0 and glmnet implementation in R. This API is a welcome addition that allows you to select and set both distribution family (for example, Gaussian) and link functions (for example, inverse log) with a coherent and well-designed API.
Generalized linear regression in Spark 2.0
How to do it...
- We use a housing dataset from the UCI machine library depository.
- Download the entire dataset from the following URLs:
The dataset is comprised of 14 columns...