The architecture of OpenAI GPT models
Transformers went from training, fine-tuning, and finally zero-shot models in less than 3 years between the end of 2017 and the first semester of 2020. A zero-shot GPT-3 transformer model requires no fine-tuning. The trained model parameters are not updated for downstream multi-tasks, which opens a new era for NLP/NLU tasks.
In this section, we will first understand the motivation of the OpenAI team that designed GPT models. We will begin by going through the fine-tuning to zero-shot models. Then we will see how to condition a transformer model to generate mind-blowing text completion. Finally, we will explore the architecture of GPT models.
We will first go through the creation process of the OpenAI team.
From fine-tuning to zero-shot models
From the start, OpenAI's research teams, led by Radford et al. (2018), wanted to take transformers from trained models to GPT models. The goal was to train transformers on unlabeled...