Cross-validation strategies
Cross-validation is one of the most important tools when evaluating standard regression and classification methods. This is because of two reasons:
- A simple holdout approach doesn’t use all the data available and, in cases where data is scarce, cross-validation makes the best use of the available data.
- Theoretically, the time series we have observed is one realization of a stochastic process, and so the acquired error measure of the data is also a stochastic variable. Therefore, it is essential to sample multiple error estimates to get an idea about the distribution of the stochastic variable. Intuitively, we can think of this as a “lack of reliability” on the error measure derived from a single slice of data.
The most common strategy that is used in standard machine learning is called k-fold cross-validation. Under this strategy, we randomly shuffle and partition the training data into k equal parts. Now,...