Algorithm considerations
In this section, we will address the question of how a data scientist should decide which of the many machine learning and statistical algorithms should be chosen to solve a particular problem. We assume some prior familiarity with statistical and machine learning models such as logistic regression, decision trees, random forests, and gradient boosting models.
As outlined in Chapter 4, H2O Model Building at Scale – Capability Articulation H2O provides multiple supervised and unsupervised learning algorithms that can be used to build models. For example, in the case of a binary classification problem, a data scientist could choose a parametric GLM model (logistic regression); semiparametric GAM; nonparametric tree-based approaches such as Random Forest, GBM, XGBoost, or RuleFit; models from the machine learning community such as Support Vector Machines (SVMs) or Deep Learning Neural Networks; or the simple Naïve Bayes Classifier. To complicate...