Classification under the hood: gradient boosted decision trees
The ultimate goal for a classification task is to solve a problem that requires us to take previously unseen data points and try to infer which of the several possible classes they belong to. We achieve this by taking a labeled training dataset that contains a representative number of data points, extracting relevant features that allow us to learn a decision boundary, and then encode the knowledge about this decision boundary into a classification model. This model then makes decisions about which class a given data point belongs to. How does the model learn to do this? This is the question that we will try to answer in this section.
In accordance with our habits throughout the book, let's start by exploring conceptually what tools humans use to navigate a set of complicated decisions. A familiar tool that many of us have used before to help make decisions when several, possibly complex factors are involved, is...