Regularization Techniques
The main goal of a data scientist is to train a model that achieves high performance and generalizes to unseen data well. The model should be able to predict the right outcome on both data used during the training process and new data. This is the reason why a model is always assessed on the test set. This set of data serves as a proxy to evaluate the ability of the model to output correct results while in production.
In Figure 6.1, the linear model (line) seems to predict relatively accurate results for both the training (circles) and test (triangles) sets.
But sometimes a model fails to generalize well and will overfit the training set. In this case, the performance of the model will be very different between the training and test sets.
Figure 6.2 shows the model (line) has only learned to predict accurately for the...