In this chapter, we will get started with one of the most popularly used state-of-the-art text embedding models called BERT. BERT has revolutionized the world of NLP by providing state-of-the-art results on many NLP tasks. We will begin the chapter by understanding what BERT is and how it differs from the other embedding models. We will then look into the working of BERT and its configuration in detail.
Moving on, we will learn how the BERT model is pre-trained using two tasks, called masked language modeling and next sentence prediction, in detail. We will then look into the pre-training procedure of BERT. At the end of the chapter, we will learn about several interesting subword tokenization algorithms, including byte pair encoding, byte-level byte pair encoding, and WordPiece.
In this chapter, we will cover the following topics:
- Basic idea of...