RNNs for modeling sequences
In this section, before we start implementing RNNs in PyTorch, we will discuss the main concepts of RNNs. We will begin by looking at the typical structure of an RNN, which includes a recursive component to model sequence data. Then, we will examine how the neuron activations are computed in a typical RNN. This will create a context for us to discuss the common challenges in training RNNs, and we will then discuss solutions to these challenges, such as LSTM and gated recurrent units (GRUs).
Understanding the dataflow in RNNs
Let’s start with the architecture of an RNN. Figure 15.3 shows the dataflow in a standard feedforward NN and in an RNN side by side for comparison:
Figure 15.3: The dataflow of a standard feedforward NN and an RNN
Both of these networks have only one hidden layer. In this representation, the units are not displayed, but we assume that the input layer (x), hidden layer (h), and output layer (o) are vectors...