For more information, refer to the following papers:
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut, available at https://arxiv.org/pdf/1909.11942.pdf
- RoBERTa: A Robustly Optimized BERT Pre-training Approach by Yinhan Liu, Myle Ott, et al., available at https://arxiv.org/pdf/1907.11692.pdf
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators by Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning, available at https://arxiv.org/pdf/2003.10555.pdf
- SpanBERT: Improving Pre-training by Representing and Predicting Spans by Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy, available at https://arxiv.org/pdf/1907.10529v3.pdf