Understanding Neural Network Transformers
Not to be confused with the electrical devices that are also called transformers, neural network transformers are the jack-of-all-trades variant of NNs. Transformers are capable of processing and capturing patterns from data of any modality, including sequential data such as text data and time-series data, image data, audio data, and video data.
The transformer architecture was introduced in 2017 with the motive of replacing RNN-based sequence-to-sequence architectures and primarily focusing on the machine translation use case of converting text data from one language to another language. The results performed better than the baseline RNN-based model and proved that we don’t need inherent inductive biases on the sequential nature of the data that the RNNs employ. Transformers then became the root of a family of neural network architectures and branched off to model variants that are capable of capturing patterns in other data modalities...