Using XGBoost, LightGBM, and CatBoost
Over the last few years, several new gradient boosting implementations have used various innovations that accelerate training, improve resource efficiency, and allow the algorithm to scale to very large datasets. The new implementations and their sources are as follows:
- XGBoost: Started in 2014 by T. Chen during his Ph.D. (T. Chen and Guestrin 2016)
- LightGBM: Released in January 2017 by Microsoft (Ke et al. 2017)
- CatBoost: Released in April 2017 by Yandex (Prokhorenkova et al. 2019)
These innovations address specific challenges of training a gradient boosting model (see this chapter's README
file on GitHub for links to the documentation). The XGBoost implementation was the first new implementation to gain popularity: among the 29 winning solutions published by Kaggle in 2015, 17 solutions used XGBoost. Eight of these solely relied on XGBoost, while the others combined XGBoost with neural networks.
We will...