Understanding LSTM
LSTM was invented in 1997 but remains a widely adopted neural network. LSTM uses the tanh
activation function as it provides nonlinearities while providing second derivatives that can be preserved for a longer sequence. The tanh
function helps to prevent exploding and vanishing gradients. An LSTM layer uses a sequence of LSTM cells sequentially connected. Let’s take an in-depth look at what the LSTM cell looks like in Figure 4.1.
Figure 4.1 – A visual deep dive into an LSTM cell among a sequence of LSTM cells that forms an LSTM layer
The first LSTM cell on the left depicts the high-level structure of an LSTM cell and the second LSTM cell on the left depicts the medium-level operations, connections, and structure of an LSTM cell, while the third cell on the right is just another LSTM cell to emphasize that LSTM layers are made of multiple LSTM cells sequentially connected to each other. Think of an LSTM cell as containing...