In this chapter, we started with the oldest trick in the book: ordinary least squares regression. Although centuries old, it is sometimes still the best solution for regression. However, we also saw more modern approaches that avoid overfitting and can give us better results, especially when we have a large number of features. We used Ridge, Lasso, and ElasticNets; these are state-of-the-art methods for regression.
We saw, once again, the danger of relying on training errors to estimate generalization: it can be an overly optimistic estimate to the point where our model has zero training errors, but we know that it is completely useless. When thinking through these issues, we were led on to two-level cross-validation, an important area that many in the field still have not completely internalized.
Throughout this chapter, we were able to rely on scikit-learn to support...