Selecting models of optimum complexity
There are different ways of selecting models with the right complexity so that the prediction error on unseen data is less. Let's discuss each of these approaches in the context of the linear regression model.
Subset selection
In the subset selection approach, one selects only a subset of the whole set of variables, which are significant, for the model. This not only increases the prediction accuracy of the model by decreasing model variance, but it is also useful from the interpretation point of view. There are different ways of doing subset selection, but the following two are the most commonly used approaches:
Forward selection: In forward selection, one starts with no variables (intercept alone), and by using a greedy algorithm, adds other variables one by one. For each step, the variable that most improves the fit is chosen to add to the model.
Backward selection: In backward selection, one starts with the full model and sequentially deletes the...