Long short-term memory (LSTM) networks
Hochreiter and Schmidhuber proposed a modification of the classical RNNs in 1997—LSTM networks. It aimed to resolve the vanishing and exploding gradients in vanilla RNNs. The design of the LSTM was inspired by the logic gates of a computer. It introduces a new component, called a memory cell, which serves as long-term memory and is used in addition to the hidden-state memory of classical RNNs. In an LSTM, multiple gates are tasked with reading, adding, and forgetting information from these memory cells. This memory cell acts as a gradient highway, allowing the gateways to pass relatively unhindered through a network. This is the key innovation that avoided vanishing gradients in RNNs.
LSTM architecture
Let’s imagine that the input to the LSTM at time t is xt, and the hidden state from the previous timestep is Ht-1. Now, there are three gates that process information. Each gate is nothing but two learnable weight matrices...