Summary
In this chapter, we learned how to achieve generalization for our models. We explored several techniques for avoiding overfitting and creating models with low bias and variance. In the beginning, differences between overfitting and underfitting were explained.
In general, overfitting occurs when a very complex statistical model suits the observed data because it has too many parameters compared to the number of observations. The risk is that an incorrect model can perfectly fit data just because it is quite complex compared to the amount of data available. Consequently, when the model is used to predict new observations, there is a failure, because it is not able to generalize. On the contrary, underfitting occurs when a regression algorithm cannot capture the underlying trend of the data. Underfitting would occur, for example, when fitting a linear model to nonlinear data. Such a model would have poor predictive performance.
We then discovered the cross-validation procedure through...