To learn more, check out the following papers:
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, by Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, available at https://arxiv.org/pdf/1810.04805.pdf.
- Gaussian Error Linear Units (GELUs), by Dan Hendrycks and Kevin Gimpel, available at https://arxiv.org/pdf/1606.08415.pdf.
- Neural Machine Translation of Rare Words with Subword Units, by Rico Sennrich, Barry Haddow, and Alexandra Birch, available at https://arxiv.org/pdf/1508.07909.pdf.
- Neural Machine Translation with Byte-Level Subwords, by Changhan Wang, Kyunghyun Cho, and Jiatao Gu, available at https://arxiv.org/pdf/1909.03341.pdf.
- Japanese and Korean Voice Search, by Mike Schuster and Kaisuke Nakajima, available at https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37842.pdf.