When we introduced classification, we stressed the importance of cross-validation for checking the quality of our predictions. When performing regression, this is not always done. In fact, we have discussed only the training errors in this chapter so far.
This is a mistake if you want to confidently infer the generalization ability. However, since ordinary least squares is a very simple model, this is often not a very serious mistake. In other words, the amount of overfitting is slight. We should still test this empirically, which we can easily do with scikit-learn.
We will use the Kfold class to build a five-fold cross-validation loop and test the generalization ability of linear regression:
from sklearn.model_selection import KFold, cross_val_predict
kf = KFold(n_splits=5)
p = cross_val_predict(lr, x, y, cv=kf)
rmse_cv = np.sqrt(mean_squared_error...