Introduction
As we saw in the previous chapter, decision trees and ensemble models based on them provide powerful methods for creating machine learning models. While random forests have been around for decades, recent work on a different kind of tree ensemble, gradient boosted trees, has resulted in state-of-the-art models that have come to dominate the landscape of predictive modeling with tabular data, or data that is organized into a structured table, similar to the case study data. The two main packages used by machine learning data scientists today to create the most accurate predictive models with tabular data are XGBoost and LightGBM. In this chapter, we'll become familiar with XGBoost using a synthetic dataset, and then apply it to the case study data in the activity.
Note
Perhaps some of the best motivation for using XGBoost comes from the paper describing this machine learning system, in the context of Kaggle, a popular online forum for machine learning competitions...