Similar to the LSTM, GRUs are also an improvement on the hidden cells in vanilla RNNs. GRUs were also created to address the vanishing gradient problem by storing memory from the past to help make better future decisions. The motivation for the GRU stemmed from questioning whether all the components that are present in the LSTM are necessary for controlling the forgetfulness and time scale of units.
The main difference here is that this architecture uses one gating unit to decide what to forget and when to update the state, which gives it a more persistent memory.
In the following diagram, you can see what the GRU architecture looks like:
As you can see in the preceding diagram, it takes in the current input (Xt) and the previous hidden state (Ht-1), and there are a lot fewer operations that take place here in comparison to the preceding LSTM. It has the...