GPT 1, 2, 3…
OpenAI is an AI research group that has been in the spotlight for quite some time because of their newsworthy works such as GPT, GPT-2, and the recently released GPT-3. In this section, we will walk through a brief discussion related to these architectures and their novel contributions. Toward the end, we will use a pre-trained version of GPT-2 for our task of text generation.
Generative pre-training: GPT
The first model in this series is called GPT, or Generative Pre-Training. It was released in 2018, about the same time as the BERT model. The paper11 presents a task-agnostic architecture based on the ideas of transformers and unsupervised learning. The GPT model was shown to beat several benchmarks such as GLUE and SST-2, though the performance was overtaken by BERT, which was released shortly after this.
GPT is essentially a language model based on the transformer-decoder we presented in the previous chapter (see the section on Transformers). Since...