Summary
A classic dropout method to improve network robustness may be applied to recurrent network sequence-wise or batch-wise to avoid instability and destruction of the recurrent transition. For example, when applied on word inputs/outputs, it is equivalent to removing the same words from the sentence, replacing them with a blank value.
The principle of stacking layers in deep learning to improve accuracy applies to recurrent networks that can be stacked in the depth direction without burden.
Applying the same principle in the transition of the recurrent nets increases the vanishing/exploding issue, but is offset by the invention of the highway networks with identity connections.
Advanced techniques for recurrent neural nets give state-of-the-art results in sequence prediction.