Summary
In this chapter, we looked at RNNs, which are different from conventional feed-forward neural networks and more powerful in terms of solving temporal tasks.
Specifically, we discussed how to arrive at an RNN from a feed-forward neural network type structure.
We assumed a sequence of inputs and outputs, and designed a computational graph that can represent the sequence of inputs and outputs.
This computational graph resulted in a series of copies of functions that we applied to each individual input-output tuple in the sequence. Then, by generalizing this model to any given single time step t in the sequence, we were able to arrive at the basic computational graph of an RNN. We discussed the exact equations and update rules used to calculate the hidden state and the output.
Next we discussed how RNNs are trained with data using BPTT. We examined how we can arrive at BPTT with standard backpropagation as well as why we can’t use standard backpropagation...