Residual networks (ResNets, Deep Residual Learning for Image Recognition, https://arxiv.org/abs/1512.03385) were released in 2015, when they won all five categories of the ImageNet challenge that year. In Chapter 1, The Nuts and Bolts of Neural Networks, we mentioned that the layers of a neural network are not restricted to sequential order, but form a graph instead. This is the first architecture we'll learn, which takes advantage of this flexibility. This is also the first network architecture that has successfully trained a network with a depth of more than 100 layers.
Thanks to better weight initializations, new activation functions, as well as normalization layers, it's now possible to train deep networks. But, the authors of the paper conducted some experiments and observed that a network with 56 layers had higher training and testing...