Why does boosting work?
The Adaptive boosting algorithm section in the previous chapter contained m models, classifiers , n observations and weights, and a voting power that is determined sequentially. The adaptation of the adaptive boosting method was illustrated using a toy example, and then applied using specialized functions. When compared with the bagging and random forest methods, we found that boosting provides the highest accuracy, which you may remember from the results in the aforementioned section in the previous chapter. However, the implementation of the algorithm does not tell us why it was expected to perform better.
We don't have a universally accepted answer on why boosting works, but according to subsection 6.2.2 of Berk (2016), there are three possible explanations:
- Boosting is a margin maximizer
- Boosting is a statistical optimizer
- Boosting is an interpolator
But what do these actually mean? We will now cover each of these points one by one. The margin for an observation...