Regularizing with ridge regression
A very common and useful way to regularize a linear regression is through penalization of the loss function. In this recipe, after reviewing what it means to add penalization to the loss function in the case of ridge regression, we will train a ridge model on the same California housing dataset as in the previous recipe, and see how it can improve the score thanks to regularization.
Getting ready
One way to make sure that a model’s parameters are not going to overfit is to keep them close to zero: if the parameters do not have the possibility to evolve freely, they are less likely to overfit.
To that end, ridge regression adds a new term (regularization term) to the loss :
Where is the L2 norm of w:
With this loss, we intuitively understand that high values of weights w are not possible, and thus overfitting is less likely. Also, 𝜆 is a hyperparameter (it can be fine-tuned...