According to their more utopian description, RNNs are able to do something that the networks we've covered so far cannot: remember. More precisely, in a simple network with a single hidden layer, the network's output, as well as the state of that hidden layer, are combined with the next element in a training sequence to form the input for a new network (with its own trainable, hidden state). A vanilla RNN can be visualized as follows:
Let's unpack this a bit. The two networks in the preceding diagram are two different representations of the same thing. One is in a Rolled state, which is simply an abstract representation of the computation graph, where an infinite number of timesteps is represented by (t). We then use the Unrolled RNN as we feed the network data and train it.
For a given forward pass, this network takes two inputs, where X is a representation...