5.3 Measures of predictive accuracy
”Everything should be made as simple as possible, but not simpler” is a quote often attributed to Einstein. As in a healthy diet, when modeling, we have to maintain a balance. Ideally, we would like to have a model that neither underfits nor overfits the data. We want to somehow balance simplicity and goodness of fit.
In the previous example, it is relatively easy to see that the model of order 0 is too simple, while the model of order 5 is too complex. In order to get a general approach that will allow us to rank models, we need to formalize our intuition about this balance of simplicity and accuracy.
Let’s look at a couple of terms that will be useful to us:
Within-sample accuracy: The accuracy is measured with the same data used to fit the model.
Out-of-sample accuracy: The accuracy measured with data not used to fit the model.
The within-sample accuracy will, on average, be greater than the out-of-sample accuracy. That is...