Working with Efficient Transformers
So far, you have learned how to design a Natural Language Processing (NLP) architecture to achieve successful task performance with transformers. In this chapter, you will learn first how to make efficient models out of trained models using distillation, pruning, and quantization. Second, you will also gain knowledge about efficient sparse transformers such as Linformer, BigBird, and Performer. You will see how they perform on various benchmarks, such as memory versus sequence length and speed versus sequence length. You will also see the practical use of model size reduction.
The importance of this chapter came to light as it is getting difficult to run large neural models under limited computational capacity. It is important to have a light general-purpose language model such as DistilBERT. This model can then be fine-tuned with good performance, like its non-distilled counterparts. Transformer-based architectures face complexity bottlenecks...