Different RNN architectures
Despite the incredible success of RNNs in solving some of the hardest problems in sequential modeling and predictions, the limitation with RNNs is that they put so much emphasis on the most recent inputs. This means that the last input you feed into the network has a higher influence on the prediction compared to the previous timestamps. . When the values of the gradients are too small making model converge slower, this is what we call “memory” loss or the vanishing gradient problem when the values of the gradients are too small making model converge slower. RNNs are not good at remembering long-term associations in the data. Another issue with the vanilla RNN is that it only used information earlier in the sequence. To address the limitations of RNNs, several variations of RNN architectures have been proposed, such as bidirectional RNN, long short-term memory (LSTM), and gated recurrent unit (GRU). We will briefly look at these popular RNN...