Generalization, overfitting, and the role of model complexity
What do we mean by a complex model? Very loosely, we think of a more complex model as having more parameters or using more features. This statement is imprecise, but the idea that model complexity broadly follows the number of model parameters/features will be precise enough for the mainly qualitative discussions of this chapter.
A more complex model can fit a training dataset more closely, as it can use the extra features to explain the variation in the response variable/target variable. What are the consequences of this increased flexibility? As a simple example, we’ll take a look at Figure 8.1, which shows three different models fitted to a small dataset. The black circles in each plot show the training data, while the blue circles show the hold-out sample data points, which, as you can see, represent an extrapolation challenge, since the hold-out data points are all to the right of the training data points...