RNN is a unique network because of its ability to remember inputs. This ability makes it perfectly suited for problems that deal with sequential data, such as time series forecasting, speech recognition, machine translation, and audio and video sequence prediction. In RNNs, data traverses in such a way that, at each node, the network learns from both the current and previous inputs, sharing the weights over time. It's like performing the same task at each step, just with different inputs that reduce the total number of parameters we need to learn.
For example, if the activation function is tanh, then the weight at the recurrent neuron is and the weight at the input neuron is . Here, we can write the equation for the state, , at time t as follows:
The gradient at each output depends on the computations of the current and...