The architecture of OpenAI GPT transformer models
In 2020, Brown et al. (2020) described the training of an OpenAI GPT-3 model containing 175 billion parameters that was trained on huge datasets, such as the 400 billion byte-pair-encoded tokens extracted from Common Crawl data. OpenAI ran the training on a Microsoft Azure supercomputer with 285,00 CPUs and 10,000 GPUs.The machine intelligence of OpenAI's GPT-3 models and their supercomputer led Brown et al. (2020) to zero-shot experiments. The idea was to use a trained model for downstream tasks without further training the parameters. The goal would be for a trained model to go directly into multi-task production with an API that could even perform tasks it wasn't trained for.The era of suprahuman cloud AI models was born. OpenAI's API requires no high-level software skills or AI knowledge. You might wonder why I use the term "suprahuman." GPT-3 and a GPT-4 model(and soon more powerful ones) can perform many...