Recurrent Highway Networks
So, let's apply the highway network design to deep transition recurrent networks, which leads to the definition of Recurrent Highway Networks (RHN), and predict the output given
the input of the transition:
![Recurrent Highway Networks](https://static.packt-cdn.com/products/9781786465825/graphics/graphics/B05525_10_11.jpg)
The transition is built with multiple steps of highway connections:
![Recurrent Highway Networks](https://static.packt-cdn.com/products/9781786465825/graphics/graphics/B05525_10_12.jpg)
![Recurrent Highway Networks](https://static.packt-cdn.com/products/9781786465825/graphics/graphics/B05525_10_07.jpg)
Here the transform gate is as follows:
![Recurrent Highway Networks](https://static.packt-cdn.com/products/9781786465825/graphics/graphics/B05525_10_14.jpg)
And, to reduce the number of weights, the carry gate is taken as the complementary to the transform gate:
![Recurrent Highway Networks](https://static.packt-cdn.com/products/9781786465825/graphics/graphics/B05525_10_10.jpg)
For faster computation on a GPU, it is better to compute the linear transformation on inputs over different time steps and
in a single big matrix multiplication, all-steps input matrices
and
at once, since the GPU will use a better parallelization, and provide these inputs to the recurrency:
y_0 = shared_zeros((batch_size, hidden_size)) y, _ = theano.scan(deep_step_fn, sequences = [i_for_H, i_for_T], outputs_info = [y_0], non_sequences = [noise_s])
With a deep transition between each step:
def deep_step_fn(i_for_H_t, i_for_T_t, y_tm1...