Text Generation with OpenAI GPT-2 and GPT-3 Models
In 2020, Brown et al. (2020) described the training of an OpenAI GPT-3 model with 175 billion parameters trained with approximately one trillion words in 50 petaflop/s days. This represents about 50*1020 operations per day for 400 billion byte-pair-encoded tokens. At the same time, we learned that OpenAI had access to a tailor-made supercomputer that contained 280,000 CPUs and 10,000 GPUs.
A new era had begun. A battle of giants had begun with the recent ground-breaking intelligence of transformers and the power of supercomputers. Microsoft, Google, Facebook, Baidu, IBM, and others produce game-changing AI resources several times a year. AI project managers and developers need to continually reinvent a way to understand, tame, and implement these mind-blowing innovations.
The machine intelligence of OpenAI GPT-3 and supercomputers' machine power led Brown et al. (2020) to zero-shot experiments. The idea was...