Generative Pre-Training (GPT-2) model
OpenAI released the first version of the GPT model in June 2018. They followed up with GPT-2 in February 2019. This paper attracted much attention as full details of the large GPT-2 model were not released with the paper due to concerns of nefarious uses. The large GPT-2 model was released subsequently in November 2019. The GPT-3 model is the most recent, released in May 2020.
Figure 5.5 shows the number of parameters in the largest of each of these models:
Figure 5.5: Parameters in different GPT models
The first model used the standard Transformer decoder architecture with twelve layers, each with twelve attention heads and 768-dimensional embeddings, for a total of approximately 110 million parameters, which is very similar to the BERT model. The largest GPT-2 has over 1.5 billion parameters, and the most recently released GPT-3 model's largest variant has over 175 billion parameters!
Cost of training language...