Before we close this chapter, we will briefly look at GRUs and stacked LSTMs.
GRUs
As we saw, LSTMs are huge networks and they have a lot of parameters. Consequently, we need to update a lot of parameters that are highly computationally expensive. Can we do better?
Yes! GRUs can help us with it.
GRUs use only two gates instead of three, as we used in LSTMs. They combine the forget gate and the candidate-choice part in the input gate into one gate, called the update gate. The other gate is the reset gate, which decides how the memory should get updated with the newly computed information. Based on the output of these two gates, it is decided what to send across as the output from this cell and how the hidden state is to be updated. This is done via using something called a content state, which holds the new information. As a result, the number of parameters in the network is drastically reduced.