Model Regularization with Lasso Regression
As mentioned at the beginning of this chapter models can overfit training data. One reason for this is having too many features with large coefficients (also called weights). The key to solving this type of overfitting problem is reducing the magnitude of the coefficients.
You may recall that weights are optimized during model training. One method for optimizing weights is called gradient descent. The gradient update rule makes use of a differentiable loss function. Examples of differentiable loss functions are:
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
For lasso regression, a penalty is introduced in the loss function. The technicalities of this implementation are hidden by the class. The penalty is also called a regularization parameter.
Consider the following exercise in which you over-engineer a model to introduce overfitting, and then use lasso regression to get better results.