One really great way to reduce overfitting in deep neural networks is to employ a technique called dropout. Dropout does exactly what it says, it drop neurons out of a hidden layer. Here's how it works.
Through every minibatch, we will randomly choose to turn off nodes in each hidden layer. Imagine we had some hidden layer where we had implemented dropout, and we chose the drop probability to be 0.5. That means, for every mini batch, for every neuron, we flip a coin to see whether we use that neuron. In doing so, you'd probably randomly turn off about half of the neurons in that hidden layer:
If we do this over and over again, it's like we're training many smaller networks. The model weights remain relatively smaller, and each smaller network is less likely to overfit the data. It also forces each neuron to be less dependent...