Exploring ensemble classifiers
In Chapter 13, Applied Machine Learning: Identifying Credit Default, we learned how to build an entire machine learning pipeline, which contained both preprocessing steps (imputing missing values, encoding categorical features, and so on) and a machine learning model. Our task was to predict customer default, that is, their inability to repay their debts. We used a decision tree model as the classifier.
Decision trees are considered simple models and one of their drawbacks is overfitting to the training data. They belong to the group of high-variance models, which means that a small change to the training data can greatly impact the tree’s structure and its predictions. To overcome those issues, they can be used as building blocks for more complex models. Ensemble models combine predictions of multiple base models (for example, decision trees) in order to improve the final model’s generalizability and robustness. This way, they transform...