Deep transition recurrent network
Contrary to stacked recurrent network, a deep transition recurrent network consists of increasing the depth of the network along the time direction, by adding more layers or micro-timesteps inside the recurrent connection.
To illustrate this, let us come back to the definition of a transition/recurrent connection in a recurrent network: it takes as input the previous state and the input data at time step t, to predict its new state .
In a deep transition recurrent network (figure 2), the recurrent transition is developed with more than one layer, up to a recurrency depth L: the initial state is set to the output of the last transition:
Furthermore, inside the transition, multiple states or steps are computed:
The final state is the output of the transition: