Adding a parameter norm penalty to the objective function is the most classic of the regularization methods. What this does is limit the capacity of the model. This method has been around for several decades and predates the advent of deep learning. We can write this as follows:
Here, . The α value, in the preceding equation, is a hyperparameter that determines how large a regularizing effect the regularizer will have on the regularized cost function. The greater the value of α is, the more regularization is applied, and the smaller it is, the less of an effect regularization has on the cost function.
In the case of neural networks, we only apply the parameter norm penalties to the weights since they control the interaction or relationship between two nodes in successive layers, and we leave the biases as they are since they need less data in comparison...