As an alternative to the selection methods discussed in the previous sections (forward, backward, stepwise), it is possible to adopt methods that use all predictors but bind or adjust the coefficients by bringing them to very small or zero values (shrinkage). These methods are actually defined as automatic feature selection methods, as they improve generalization. They are called regularization methods and involve modifying the performance function, normally selected as the sum of the squares of regression errors on the training set.
When a large number of variables are available, the least square estimates of a linear model often have a low bias but a high variance with respect to models with fewer variables. Under these conditions, as we have seen in previous sections, there is an overfitting problem. To improve precision prediction by allowing greater...