Therefore, the bi-directional layer in Keras processes a sequence of data in both the normal and reverse sequence, which allows us to pick up on words that come later on in the sequence to inform our prediction at the current time.
Essentially, the bi-directional layer duplicates any layer that's fed to it and uses one copy to process information in the normal sequential order, while the other processes data in the reverse order. Pretty neat, no? We can intuitively visualize what a bi-directional layer actually does by going through a simple example. Suppose you were modeling the two-word sequence Whats up, with a bi-directional GRU:
To do this, you will nest the GRU in a bi-directional layer, which allows Keras to generates two versions of the bi-directional model. In the preceding image, we stacked two bi-directional layers on top of each...