Model selection
The method of removal of covariates in the The multicollinearity problem section depended solely on the covariates themselves. However, it may happen more often that the covariates in the final model are selected with respect to the output. Computational cost is almost a non-issue these days and especially for not-so-large datasets! The question that arises then is, can one retain all possible covariates in the model, or do we have any choice of covariates that meet certain regression metrics, say R
2
> 60
percent?
The problem is that having more covariates increases the variance of the model, while having less of them will have a large bias. The philosophical Occam's Razor principle applies here too, and the best model is the simplest model. In our context, the smallest model that fits the data is the best. There are two types of model selection: stepwise procedures and criterion-based procedures. In this section, we will consider both the procedures.