The Rise of Suprahuman Transformers with GPT-3 Engines
In 2020, Brown et al. (2020) described the training of an OpenAI GPT-3 model containing 175 billion parameters that learned using huge datasets such as the 400 billion byte-pair-encoded tokens extracted from Common Crawl data. OpenAI ran the training on a Microsoft Azure supercomputer with 285,00 CPUs and 10,000 GPUs.
The machine intelligence of OpenAI’s GPT-3 engines and their supercomputer led Brown et al. (2020) to zero-shot experiments. The idea was to use a trained model for downstream tasks without further training the parameters. The goal would be for a trained model to go directly into multi-task production with an API that could even perform tasks it wasn’t trained for.
The era of suprahuman cloud AI engines was born. OpenAI’s API requires no high-level software skills or AI knowledge. You might wonder why I used the term “suprahuman.” You will discover that a GPT-3 engine can...