Deep residual networks (ResNet)
One key advantage of deep networks is that they have a great ability to learn different levels of representations from both inputs and feature maps. In both classification, segmentation, detection and a number of other computer vision problems, learning different levels of features generally leads to better performance.
However, you'll find that it's not easy to train deep networks as a result of the gradient vanishes (or explodes) with depth in the shallow layers during backpropagation. Figure 2.2.1 illustrates the problem of vanishing gradient. The network parameters are updated by backpropagation from the output layer to all previous layers. Since backpropagation is based on the chain rule, there is a tendency for gradients to diminish as they reach the shallow layers. This is due to the multiplication of small numbers, especially for the small absolute value of errors and parameters.
The number of multiplication operations will be proportional to the depth...