Residual connections
While very deep architectures (with many layers) perform better, they are harder to train, because the input signal decreases through the layers. Some have tried training the deep networks in multiple stages.
An alternative to this layer-wise training is to add a supplementary connection to shortcut a block of layers, named the identity connection, passing the signal without modification, in addition to the classic convolutional layers, named the residuals, forming a residual block, as shown in the following image:
Such a residual block is composed of six layers.
A residual network is a network composed of multiple residual blocks. Input is processed by a first convolution, followed by batch normalization and non-linearity:
For example, for a residual net composed of two residual blocks, and eight featuremaps in the first convolution on an input image of size 28x28, the layer output shapes will be the following:
InputLayer (None, 1, 28, 28) Conv2DDNNLayer...