Pretraining a RoBERTa Model from Scratch
In this chapter, we will build a RoBERTa model from scratch. The model will use the bricks of the transformer construction kit we need for BERT models. Also, no pretrained tokenizers or models will be used. The RoBERTa model will be built following the fifteen-step process described in this chapter.
We will use the knowledge of transformers acquired in the previous chapters to build a model that can perform language modeling on masked tokens step by step. In Chapter 2, Getting Started with the Architecture of the Transformer Model, we went through the building blocks of the original Transformer. In Chapter 3, Fine-Tuning BERT Models, we fine-tuned a pretrained BERT model.
This chapter will focus on building a pretrained transformer model from scratch using a Jupyter notebook based on Hugging Face’s seamless modules. The model is named KantaiBERT.
KantaiBERT first loads a compilation of Immanuel Kant’s books created...