Training a GPT-2 language model
In this section, we will train a GPT-2 model on a custom dataset that we will encode. We will then interact with our customized model. We will be using the same kant.txt
dataset as in Chapter 4, Pretraining a RoBERTa Model from Scratch.
We will open the notebook and run it cell by cell.
Step 1: Prerequisites
The files referred to in this section are available in the AppendixIV
directory of this book’s GitHub repository:
- Activate the GPU in the Google Colab’s notebook runtime menu if you are running it on Google Colab, as explained in Step 1: Activating the GPU in Appendix III, Generic Text Completion with GPT-2.
- Upload the following Python files to Google Colaboratory with the built-in file manager:
train.py
,load_dataset.py
,encode.py
,accumulate.py
,memory_saving_gradients.py
. - These files originally come from N Shepperd’s GitHub repository: https://github.com/nshepperd/gpt-2. However, you...