In this section, we're going to see what happens when we train deep networks, which has many layers, probably over 30 or maybe even over 100. Then, we'll present residual networks as a solution for scaling too many layers, along with an architecture example with state-of-the-art accuracy.
Using residual networks for image recognition
Deep network performance
In theory, it's clear that the more layers we add to the neuron network, the better it is. This is the case with the green line, as shown in the following graph. As soon as we add more layers, we'll see the error rate go down, ideally to zero. Unfortunately, in practice, we see this happening only partially. It's true that the error rate goes down...