Highway networks design principle
Adding more layers in the transition connections increases the vanishing or exploding gradient issue during backpropagation in long term dependency.
In the Chapter 4, Generating Text with a Recurrent Neural Net, LSTM and GRU networks have been introduced as solutions to address this issue. Second order optimization techniques also help overcome this problem.
A more general principle, based on identity connections, to improve the training in deep networks Chapter 7, Classifying Images with Residual Networks, can also be applied to deep transition networks.
Here is the principle in theory:
Given an input x to a hidden layer H with weigh :
A highway networks design consists of adding the original input information (with an identity layer) to the output of a layer or a group of layers, as a shortcut:
y = x
Two mixing gates, the transform gate and the carry gate, learn to modulate the influence of the transformation in the hidden layer, and the amount of original...