Training a tokenizer and pretraining a transformer
In this chapter, we will train a transformer model named KantaiBERT using the building blocks provided by Hugging Face for BERT-like models. We covered the theory of the building blocks of the model we will be using in Chapter 3, Fine-Tuning BERT Models.
We will describe KantaiBERT, building on the knowledge we acquired in previous chapters.
KantaiBERT is a Robustly Optimized BERT Pretraining Approach (RoBERTa)-like model based on the architecture of BERT.
The initial BERT models brought innovative features to the initial transformer models, as we saw in Chapter 3. RoBERTa increases the performance of transformers for downstream tasks by improving the mechanics of the pretraining process.
For example, it does not use WordPiece
tokenization but goes down to byte-level Byte-Pair Encoding (BPE). This method paved the way for a wide variety of BERT and BERT-like models.
In this chapter, KantaiBERT, like BERT, will...