Recurrent Highway Networks
So, let's apply the highway network design to deep transition recurrent networks, which leads to the definition of Recurrent Highway Networks (RHN), and predict the output given the input of the transition:
The transition is built with multiple steps of highway connections:
Here the transform gate is as follows:
And, to reduce the number of weights, the carry gate is taken as the complementary to the transform gate:
For faster computation on a GPU, it is better to compute the linear transformation on inputs over different time steps and in a single big matrix multiplication, all-steps input matrices and at once, since the GPU will use a better parallelization, and provide these inputs to the recurrency:
y_0 = shared_zeros((batch_size, hidden_size)) y, _ = theano.scan(deep_step_fn, sequences = [i_for_H, i_for_T], outputs_info = [y_0], non_sequences = [noise_s])
With a deep transition between each step:
def deep_step_fn(i_for_H_t, i_for_T_t, y_tm1, noise_s...