While regular RNNs can theoretically ingest information from long sequences, such as full documents, they are limited in how far back they can look to learn information. To overcome this, researchers have developed variants on the traditional RNN that utilize a unit called a memory cell, which helps the network remember important information. They were developed as a means to solve the vanishing gradient problem that occurs with traditional RNN models. There are two main variations of RNN that utilize memory cell architectures, known as the GRU and the LSTM. These architectures are the most widely used RNN architectures, so we'll pay some what attention to their mechanics.
Memory units – LSTMs and GRUs
LSTM
LSTMs...