When working with regressions, we may look to add a penalty term to our regression equation to reduce overfitting by punishing certain decisions for coefficients made by the model; this is called regularization. We are looking for the coefficients that will minimize this penalty term. The idea is to shrink the coefficients toward zero for features that don't contribute much to reducing the error of the model. Some common techniques are ridge regression, LASSO (short for Least Absolute Shrinkage and Selection Operator) regression, and elastic net regression, which combines the LASSO and ridge penalty terms.
Ridge regression, also called L2 regularization, punishes high coefficients () by adding the sum of the squares of the coefficients to the cost function (which regression looks to minimize when fitting), as per the following penalty term:
This penalty term...