We know that a deep neural network is a network that has many hidden layers. Similarly, a deep RNN has more than one hidden layer, but how are the hidden states computed when we have more than one hidden layer? We know that an RNN computes the hidden state by taking inputs and the previous hidden state, but how are the hidden states in the later layers computed?
For instance, let's see how in hidden layer 2 is computed. It takes the previous hidden state, , and the previous layer's output, , as inputs to compute .
Thus, when we have an RNN with more than one hidden layer, hidden layers at the later layers will be computed by taking the previous hidden state and the previous layer's output as input, as shown in the following diagram: