Dense connections
Stochastic depth skips some random layers by creating a direct connection. Going one step further, instead of removing some random layers, another way to do the same thing is to add an identity connection with previous layers:
As for residual blocks, a densely connected convolutional network consists of repeating dense blocks to create a stack of layer blocks:
Such an architecture choice follows the same principles as those seen in Chapter 10, Predicting Times Sequence with Advanced RNN, with highway networks: the identity connection helps the information to be correctly propagated and back-propagated through the network, reducing the effect of exploding/vanishing gradients when the number of layers is high.
In Python, we replace our residual block with a densely connected block:
def dense_block(network, transition=False, first=False, filters=16): ...