Training a GPT-2 language model
This section will train a GPT-2 model on a custom dataset that we will encode. We will then interact with our customized model. We will be using the same kant.txt
dataset as in Chapter 3, Pretraining a RoBERTa Model from Scratch.
This section refers to the code of Training_OpenAI_GPT_2.ipynb
, which is in this chapter's directory of the book on GitHub.
It is important to note that we are running a low-level GPT- 2 model and not a one-line call to obtain a result. We are also avoiding pre-packaged versions. We are getting our hands dirty to understand the architecture of a GPT-2 from scratch. You might get some deprecation messages. However, the effort is worthwhile.
We will open the notebook and run it cell by cell.
Step 1: Prerequisites
The files referred to in this section are available in the chapter directory of the GitHub repository of this book:
- Activate the GPU in the notebook's runtime...