Regularization with XGBoost
After a recipe introducing boosting and the use of XGBoost for classification, let’s now have a look at how to regularize such models. We will be using the same Titanic dataset and try to improve test accuracy.
Getting ready
Just like Random Forest, an XGBoost model is made of decision trees. Consequently, it has some hyperparameters such as the maximum depth of trees (max_depth
) or the number of trees (n_estimators
) that can allow to regularize in the same way. It also has several other hyperparameters related to the decision trees that can be fine-tuned:
subsample
: The number of samples to randomly draw for training, equivalent tomax_sample
for scikit-learn’s decision trees. A smaller value may add regularization.colsample_bytree
: The number of features to randomly draw (equivalent to scikit-learn’smax_features
) for each tree. A smaller value may add regularization.colsample_bylevel
: The number of features...