Introduction to efficient, light, and fast transformers
Transformer-based models have distinctly achieved state-of-the-art results in many NLP problems at the cost of quadratic memory and computational complexity. We can highlight the issues regarding complexity as follows:
- The models are not able to efficiently process long sequences due to their self-attention mechanism, which scales quadratically with the sequence length.
- An experimental setup using a typical GPU with 16 GB can handle the sentences of 512 tokens for training and inference. However, longer entries can cause problems.
- The NLP models keep growing from the 110 million parameters of BERT-base to the 17 billion parameters of Turing-NLG and to the 175 billion parameters of GPT-3. This notion raises concerns about computational and memory complexity.
- We also need to care about costs, production, reproducibility, and sustainability. Hence, we need faster and lighter transformers, especially on edge...