To learn more, check out the following papers:
- Distilling the Knowledge in a Neural Network by Geoffrey Hinton, Oriol Vinyals, Jeff Dean, available at https://arxiv.org/pdf/1503.02531.pdf
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf available at https://arxiv.org/pdf/1910.01108.pdf
- TinyBERT: Distilling BERT for Natural Language Understanding by Xiaoqi Jiao et al, available at https://arxiv.org/pdf/1909.10351.pdf
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks by Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, and Jimmy Lin, available at https://arxiv.org/pdf/1903.12136.pdf