Fine-tuning transformers
In this section, we will discuss how to conduct fine-tuning on pre-trained transformer models. Here, we mainly focus on the BERT model, which is fully trained, and we will work on the SQuAD 2.0 dataset.
The whole code base for running custom training on the BERT model can be easily found on the Hugging Face website (https://huggingface.co/transformers/custom_datasets.html#qa-squad). Our previous model parallelism implementation can be directly applied to this code base to speed up model training and serving.
Here, we highlight the important steps in the workflow of fine-tuning BERT on SQuAD 2.0. The overview is shown in the following screenshot:
As shown in the preceding screenshot, the whole fine-tuning process involves three steps, as follows:
- Tokenize the input string.
- Download the pre-trained base model.
- Then, use the tokenized input to do...