References
Please go through the following content for more information on a few topics covered in the chapter:
- The Bitter Lesson, Rich Sutton, March 13, 2019: http://www.incompleteideas.net/IncIdeas/BitterLesson.html
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding: https://aclanthology.org/N19-1423/. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, Volume 33. Pages 1877-1901. Curran Associates, Inc.
- AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE: https://arxiv.org/pdf/2010.11929v2.pdf
- AN ENSEMBLE OF SIMPLE CONVOLUTIONAL NEURAL NETWORK MODELS FOR MNIST DIGIT RECOGNITION: https://arxiv.org/pdf/2008.10400v2.pdf
- Language Models are Few-Shot Learners: https://arxiv.org/pdf/2005.14165v4.pdf
- PaLM: Scaling Language Modeling with Pathways: https://arxiv.org/pdf/2204.02311v3.pdf
- MOGRIFIER LSTM: https://arxiv.org/pdf/1909.01792v2.pdf
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding: https://arxiv.org/pdf/1810.04805.pdf
- Improving Language Understanding by Generative Pre-Training: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
- ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS: https://arxiv.org/pdf/2003.10555.pdf
- Language (Technology) is Power: A Critical Survey of “Bias” in NLP: https://arxiv.org/pdf/2005.14050.pdf
- Scaling Laws for Neural Language Models: https://arxiv.org/pdf/2001.08361.pdf
- PaLM: Scaling Language Modeling with Pathways: https://arxiv.org/pdf/2204.02311.pdf
- Training Compute-Optimal Large Language Models: https://arxiv.org/pdf/2203.15556.pdf
- Atlas: Few-shot Learning with Retrieval Augmented Language Models: https://arxiv.org/pdf/2208.03299.pdf