Ridge, Lasso, and ElasticNet
Ridge regression imposes an additional shrinkage penalty to the ordinary least squares loss function to limit its squared L2 norm:
In this case, X is a matrix containing all samples as columns and the term w represents the weight vector. The additional term (through the coefficient alpha—if large it implies a stronger regularization and smaller values) forces the loss function to disallow an infinite growth of w, which can be caused by multicollinearity or ill-conditioning. In the following figure, there's a representation of what happens when a Ridge penalty is applied:
The gray surface represents the loss function (here, for simplicity, we're working with only two weights), while the circle center O is the boundary imposed by the Ridge condition. The minimum will have smaller w values and potential explosions are avoided.
In the following snippet, we're going to compare LinearRegression
and Ridge
with a cross-validation:
from sklearn.datasets import load_diabetes...