Building a transformer model for language modeling
In this section, we will explore what transformers are and build one using PyTorch for the task of language modeling. We will also learn how to use some of its successors, such as BERT and GPT, via PyTorch's pretrained model repository. Before we start building a transformer model, let's quickly recap what language modeling is.
Reviewing language modeling
Language modeling is the task of figuring out the probability of the occurrence of a word or a sequence of words that should follow a given sequence of words. For example, if we are given French is a beautiful _____ as our sequence of words, what is the probability that the next word will be language or word, and so on? These probabilities are computed by modeling the language using various probabilistic and statistical techniques. The idea is to observe a text corpus and learn the grammar by learning which words occur together and which words never occur together...