Exploring fine-tuning transformers
In this section, we’ll use PyTorch to fine-tune a pre-trained transformer. More specifically, we’ll fine-tune a DistilBERT transformer-encoder (DistilBERT, a distilled version of BERT: smaller, faster, cheaper, and lighter, https://arxiv.org/abs/1910.01108) to classify whether a movie review is positive or negative. We’ll use the Rotten Tomatoes dataset (https://huggingface.co/datasets/rotten_tomatoes), which contains around 10,000 reviews, split equally between positive and negative (https://huggingface.co/datasets/rotten_tomatoes), licensed under Apache 2.0, derived from https://huggingface.co/datasets/rotten_tomatoes/blob/main/rotten_tomatoes.py. We’ll implement the example with the help of the Transformers library’s Trainer
class (https://huggingface.co/docs/transformers/main_classes/trainer), which implements the basic training loop, model evaluation, distributed training on multiple GPUs/TPUs, mixed precision...