The architecture of OpenAI GPT transformer models
Transformers went from training, to fine-tuning, and finally to zero-shot models in less than three years between the end of 2017 and the first part of 2020. A zero-shot GPT-3 transformer model requires no fine-tuning. The trained model parameters are not updated for downstream multi-tasks, which opens a new era for NLP/NLU tasks.
In this section, we will first learn about the motivation of the OpenAI team that designed GPT models. We will begin by going through the fine-tuning of zero-shot models. Then we will see how to condition a transformer model to generate mind-blowing text completion. Finally, we will explore the architecture of GPT models.
We will first go through the creation process of the OpenAI team.
The rise of billion-parameter transformer models
The speed at which transformers went from small models trained for NLP tasks to models that require little to no fine-tuning is staggering.
Vaswani et al...