Training a GRU
In this recipe, we will keep exploring RNNs with the GRU – what it is, how it works, and how to train such a model.
Getting started
One of the main limitations of RNNs is the memory of the network throughout their steps. GRUs try to overcome this limit by adding a memory gate.
If we take a step back and describe an RNN cell with a simple diagram, it could look like Figure 8.5.
Figure 8.5 – A diagram of an RNN cell
So basically, at each step t, there are both a hidden state and a set of features . They are concatenated, then weights are applied and an activation function g resulting in a new hidden state . Optionally, an output is computed from this hidden state, and so on.
But what if this step of features is not relevant? Or what if it would be useful for the network to just remember fully this hidden state from time to time? This is exactly what a GRU does, by adding a new set of parameters through what is...