Learning more about large language models
Large language models are a class of ML models that have been trained on a broad range of internet text.
The term “large” in “large language models” refers to the number of parameters that these models have. For example, GPT-3 has 175 billion parameters. These models are trained using self-supervised learning on a large corpus of text, which means they predict the next word in a sentence (such as GPT) or a word based on surrounding words (such as BERT, which is also trained to predict whether a pair of sentences is sequential). Because they are exposed to such a large amount of text, these models learn grammar, facts about the world, reasoning abilities, and also biases in the data they’re trained on.
These models are transformer-based, meaning they leverage the transformer architecture, which uses self-attention mechanisms to weigh the importance of words in input data. This architecture allows these...