Model training and evaluation
The following code snippet will train two classifiers, CatBoost and Random Forest:
cb_mdl = cb.CatBoostRegressor(
depth=7, learning_rate=0.2, random_state=rand, verbose=False
)
cb_mdl = cb_mdl.fit(X_train, y_train)
rf_mdl =ensemble.RandomForestRegressor(n_jobs=-1,random_state=rand)
rf_mdl = rf_mdl.fit(X_train.to_numpy(), y_train.to_numpy())
Next, we can evaluate the CatBoost model using a regression plot, and a few metrics. Run the following code, which will output Figure 4.1:
mdl = cb_mdl
y_train_pred, y_test_pred = mldatasets.evaluate_reg_mdl(
mdl, X_train, X_test, y_train, y_test
)
The CatBoost model produced a high R-squared of 0.94 and a test RMSE of nearly 3,100. The regression plot in Figure 4.1 tells us that although there are quite a few cases that have an extremely high error, the vast majority of the 64,000 test samples were predicted fairly well. You can confirm this by running the following code:
thresh = 4000...