RNN and LSTM networks are widely used in sequential tasks such as next word prediction, machine translation, text generation, and more. However one of the major challenges with the recurrent model is capturing the long-term dependency.
To overcome this limitation of RNNs, a new architecture called Transformer was introduced in the paper Attention Is All You Need. The transformer is currently the state-of-the-art model for several NLP tasks. The advent of the transformer created a major breakthrough in the field of NLP and also paved the way for new revolutionary architectures such as BERT, GPT-3, T5, and more.
The transformer model is based entirely on the attention mechanism and completely gets rid of recurrence. The transformer uses a special type of attention mechanism called self-attention. We will learn about this in detail in the upcoming sections.
Let's understand how the transformer works with a language translation task. The transformer...