Understanding transformers
Transformers are a type of neural network architecture that was introduced in a paper called Attention is All You Need by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin (Advances in neural information processing systems 30 (2017), Harvard). They have been very influential in the field of NLP and have formed the basis for state-of-the-art models such as BERT and GPT.
The key innovation in transformers is the self-attention mechanism, which allows the model to weigh the relevance of each word in the input when producing an output, thereby considering the context of each word. This is unlike previous models such as RNNs or RNNs, which process the input sequentially and, therefore, have a harder time capturing the long-range dependencies between words.
Architecture of transformers
A transformer is composed of an encoder and a decoder, both of which are made up of several...