Gated Recurrence Units
In the previous section, we saw that LSTMs have a lot of parameters and seem much more complex than the regular RNN. You may be wondering, are all these apparent complications really necessary? Can the LSTM be simplified a little without it losing significant predictive power? Researchers wondered the same for a while, and in 2014, Kyunghyun Cho and their team proposed the GRU as an alternative to LSTMs in their paper (https://arxiv.org/abs/1406.1078) on machine translation.
GRUs are simplified forms of LSTMs and aim at reducing the number of parameters while retaining the power of the LSTM. In tasks around speech modeling and language modeling, GRUs provide the same performance as LSTMs, but with fewer parameters and faster training times.
One major simplification done in a GRU is the omission of the explicit cell state. This sounds counterintuitive considering that the freely flowing cell state was what gave the LSTM its power, right? What really gave...