RNN variants
In this section, we will look at a couple of variations on the basic RNN architecture that can provide performance improvements in some specific circumstances. Note that these strategies can be applied for different kinds of RNN cells, as well as for different RNN topologies, which we will learn about later.
Bidirectional RNNs
We have seen how, at any given time step t, the output of the RNN is dependent on the outputs at all previous time steps. However, it is entirely possible that the output is also dependent on the future outputs as well. This is especially true for applications such as natural language processing where the attributes of the word or phrase we are trying to predict may be dependent on the context given by the entire enclosing sentence, not just the words that came before it.
This problem can be solved using a bidirectional LSTM, which are essentially two RNNs stacked on top of each other, one reading the input from left to right, and the other...