Performing logistic regression using H2O
Generalized linear models (GLM) are widely used in both regression- and classification-based predictive analysis. These models optimize using maximum likelihood and scale well with larger datasets. In H2O, GLM has the flexibility to handle both L1 and L2 penalties (including elastic net). It supports Gaussian, Binomial, Poisson, and Gamma distributions of dependent variables. It is efficient in handling categorical variables, computing full regularizations, and performing distributed n-fold cross validations to control for model overfitting. It has a feature to optimize hyperparameters such as elastic net (α) using distributed grid searches along with handling upper and lower bounds for predictor attribute coefficients. It can also handle automatic missing value imputation. It uses the Hogwild method for optimization, a parallel version of stochastic gradient descent.
Getting ready
The previous chapter provided the details for the installation of H2O...