Regularization with network architecture
In this recipe, we will explore a less popular, but still sometimes useful, regularization method: adapting the neural network architecture. After reviewing why to use this method and when, we will apply it to the California housing dataset, a regression task.
Getting ready
Sometimes, the best way to regularize is not to use any fancy techniques but only common sense. In many cases, it happens that the neural network used is just too large for the input task and dataset. An easy rule of thumb is to have a quick look at the number of parameters in the network (e.g., weights and biases) and compare it to the number of data points: if the ratio is above 1 (i.e., there are more parameters than data points), there is a risk of severe overfitting.
Note
If transfer learning is used, this rule of thumb no longer applies since the network has been trained on a presumably large enough dataset.
If we take a step back and go back to linear...