Regularization to improve model generalizability
We learned in the previous chapter that high model complexity could cause overfitting. One of the approaches to controlling the model complexity and reducing the effect of features that affect model generalizability is regularization. In the regularization process, we consider a regularization or penalty term in the loss function to be optimized during the training process. Regularization, in the simple case of linear modeling, can be added as follows to the loss function during the optimization process:
where the first term is the loss and Ω(W) is the regularization term as a function of model weights, or parameters, W. However, regularization could be used with different machine learning methods such as SVMs or LightGBM (refer to Table 5.2). Three of the common regularization terms are shown in the following table including L1 regularization, L2 regularization, and their combination.