Summary
In this chapter, we have experienced autoencoding models both theoretically and practically. Starting with learning the basics of BERT, we trained it as well as a corresponding tokenizer from scratch. We also discussed how to work inside other frameworks, such as Keras. Besides BERT, we also reviewed other autoencoding models such as ALBERT, RoBERTa, ELECTRA, and DeBERTa. To avoid excessive code repetition, we did not provide the full implementation for training other models. During the BERT training, we trained the WordPiece tokenization algorithm. In the last part, we examined other tokenization algorithms since it is worth discussing and understanding all of them as different Transformer architectures utilize different tokenization algorithms.
Autoencoding models use the left decoder side of the original Transformer and are mostly fine-tuned for classification problems. In the next chapter, we will discuss and learn about the right decoder part of Transformers to implement...