What are LLMs and how are they different from LMs?
An LM is a type of ML model that is trained to predict the next word (or character or subword, depending on the granularity of the model) in a sequence, given the words that came before it (or in some models, the surrounding words). It’s a probabilistic model that is capable of generating text that follows a certain linguistic style or pattern.
Before the advent of Transformer-based models such as generative pretrained Transformers (GPTs) and Bidirectional Encoder Representations from Transformers (BERT), there were several other types of LMs widely used in NLP tasks. The following subsections discuss a few of them.
n-gram models
These are some of the simplest LMs. An n-gram model uses the (n-1) previous words to predict the nth word in a sentence. For example, in a bigram (2-gram) model, we would use the previous word to predict the next word. These models are easy to implement and computationally efficient, but they...