Ensembling decision trees – gradient boosted trees
Boosting, which is another ensemble technique, takes an iterative approach instead of combining multiple learners in parallel. In boosted trees, individual trees are no longer trained separately. Specifically, in gradient boosted trees (GBT) (also called gradient boosting machines), individual trees are trained in succession where a tree aims to correct the errors made by the previous tree. The following two diagrams illustrate the difference between random forest and GBT:
Random forest builds each tree independently using a different subset of the dataset, and then combines the results at the end by majority votes or averaging:
Figure 4.14: The random forest workflow
The GBT model builds one tree at a time and combines the results along the way:
Figure 4.15: The GBT workflow
We will use the XGBoost package (https://xgboost.readthedocs.io/en/latest/) to implement GBT. We first install the XGBoost...