You may recall that our linear model follows the form: Y = B0 + B1x1 +...Bnxn + e, and that the best fit tries to minimize the RSS, which is the sum of the squared errors of the actual minus the estimate, or e12 + e22 + ... en2.
With regularization, we'll apply what is known as a shrinkage penalty in conjunction with RSS minimization. This penalty consists of a lambda (symbol λ), along with the normalization of the beta coefficients and weights. How these weights are normalized differs in terms of techniques, and we'll discuss them accordingly. Quite simply, in our model, we're minimizing (RSS + λ (normalized coefficients)). We'll select λ, which is known as the tuning parameter, in our model building process. Please note that if lambda is equal to 0, then our model is equivalent to OLS, as it cancels out the normalization...