Training an RNN model
To explain how we optimize the weights (parameters) of an RNN, we first annotate the weights and the data on the network, as follows:
- U denotes the weights connecting the input layer and the hidden layer.
- V denotes the weights between the hidden layer and the output layer. Note here that we use only one recurrent layer for simplicity.
- W denotes the weights of the recurrent layer; that is, the feedback layer.
- xt denotes the inputs at time step t.
- st denotes the hidden state at time step t.
- ht denotes the outputs at time step t.
Next, we unfold the simple RNN model over three time steps: t − 1, t, and t + 1, as follows:
Figure 12.9: Unfolding a recurrent layer
We describe the mathematical relationships between the layers as follows:
- We let a denote the activation function for the hidden layer. In RNNs, we usually choose tanh or ReLU as the activation function for the hidden layers...