Adapting the model to the domain
Although the fine-tuning approach for Transformer architectures usually performs well, differences in the distribution of source and target data can have a significant impact on the effectiveness of fine-tuning (Transfer Learning in Natural Language Processing, Ruder et al., NAACL 2019). If the source and target datasets are substantially different, fine-tuning may struggle to learn without adaptation.
Previous studies have shown that pretrained Transformers are the most robust architectures for handling word distribution shifts and have greater generalization capacity than other architectures (Yildirim, Savas, et al. Adaptive Fine-tuning for Multiclass Classification over Software Requirement Data, arXiv preprint arXiv:2301.00495 (2023).). However, there is still room for improvement (Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks, Gururangan et al., ACL 2020), using additional in-domain data. By specializing the model...