Gradient boosting – ensembles for most tasks
AdaBoost can also be interpreted as a stagewise forward approach to minimizing an exponential loss function for a binary outcome, y , that identifies a new base learner, hm, at each iteration, m, with the corresponding weight,, and adds it to the ensemble, as shown in the following formula:
This interpretation of AdaBoost as a gradient descent algorithm that minimizes a particular loss function, namely exponential loss, was only discovered several years after its original publication.
Gradient boosting leverages this insight and applies the boosting method to a much wider range of loss functions. The method enables the design of machine learning algorithms to solve any regression, classification, or ranking problem, as long as it can be formulated using a loss function that is differentiable and thus has a gradient. Common example loss functions for different tasks include:
- Regression: The mean-squared...