Appendix I — Terminology of Transformer Models
The past decades have produced Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and more types of Artificial Neural Networks (ANNs). They all have a certain amount of vocabulary in common.
Transformer models introduced some new words and used existing words slightly differently. This appendix briefly describes transformer models to clarify the usage of deep learning vocabulary when applied to transformers.
The motivation of transformer model architecture relies upon an industrial approach to deep learning. The geometric nature of transformers boosts parallel processing. In addition, the architecture of transformers perfectly fits hardware optimization requirements. Google, for example, took advantage of the stack structure of transformers to design domain-specific optimized hardware that requires less floating-number precision.
Designing transformers models implies taking hardware into account...