Gated recurrent unit (GRU)
In 2014, Cho et al. proposed another variant of the RNN that has a much simpler structure than an LSTM, called a gated recurrent unit (GRU). The intuition behind this is similar to when we use a bunch of gates to regulate the information that flows through the cell, but a GRU eliminates the long-term memory component and uses just the hidden state to propagate information. So, instead of the memory cell becoming the gradient highway, the hidden state itself becomes the “gradient highway.” In keeping with the same notation convention we used in the previous section, let’s look at the updated equations for a GRU.
While we had three gates in an LSTM, we only have two in a GRU:
- Reset gate: This gate decides how much of the previous hidden state will be considered as the candidate's hidden state of the current timestep. The equation for this is:
- Update gate: The update gate decides how much of the previous hidden...