- What are the main advantages of LSTMs over the simple RNN architecture?
LSTMs suffer less from gradient vanishing and are more capable of storing long-term relationships in recurrent data. While they require more computing power, this usually leads to better predictions.
- How is a CNN used when it is applied before the LSTM?
The CNN acts as a feature extractor and reduces the dimensionality of the input data. By applying a pretrained CNN, we extract meaningful features from the input images. The LSTM trains faster since those features have a much smaller dimensionality than the input image.
- What is vanishing gradient and why does it occur? Why is it a problem?
When backpropagating the error in RNNs, we need to go back through the time steps as well. If there are many time steps, the information slowly fades away due to the way in which the gradient is computed. It is a problem since it makes it harder for the network to learn how to generate good predictions.
- What are...