Preparing data and base models
Before introducing and applying XGBoost hyperparameters, let's prepare by doing the following:
Getting the heart disease dataset
Building an
XGBClassifier
modelImplementing
StratifiedKFold
Scoring a baseline XGBoost model
Combining
GridSearchCV
withRandomizedSearchCV
to form one powerful function
Good preparation is essential for gaining accuracy, consistency, and speed when fine-tuning hyperparameters.
The heart disease dataset
The dataset used throughout this chapter is the heart disease dataset originally presented in Chapter 2, Decision Trees in Depth. We have chosen the same dataset to maximize the time spent doing hyperparameter fine-tuning, and to minimize the time spent on data analysis. Let's begin the process:
Go to https://github.com/PacktPublishing/Hands-On-Gradient-Boosting-with-XGBoost-and-Scikit-learn/tree/master/Chapter06 to load
heart_disease.csv
into a DataFrame and display the...