The next figure shows an unfolded version of an RNN, obtained by unrolling the network structure for the entire input sequence, at different and discrete times. It is immediately clear that it is different from the typical multi-level neural networks, which use different parameters at each level; an RNN uses the same parameters, U, V, W, for each instant of time.
Indeed, RNNs perform the same computation at each instance, on different inputs of the same sequence. Sharing the same parameters, also, an RNN strongly reduces the number of parameters that the network must learn during the training phase, thus also improving the training times.
Regarding this unfolded version, it is evident how through the backpropagation algorithm with only a small change, you can train networks of this type.
In fact, because the parameters are shared for each instant time, the computed gradient depends on the current...