In Chapter 2, Understanding the BERT Model, we learned how to pre-train BERT using masked language modeling and next-sentence prediction tasks. But pre-training BERT from scratch is computationally expensive. So, we can download the pre-trained BERT model and use it. Google has open sourced the pre-trained BERT model and we can download it from Google Research's GitHub repository – https://github.com/google-research/bert. They have released the pre-trained BERT model with various configurations, shown in the following figure. denotes the number of encoder layers and denotes the size of the hidden unit (representation size):
The pre-trained model is also available in the BERT-uncased and BERT-cased formats. In BERT-uncased, all the tokens are lowercased, but in BERT-cased, the tokens are not lowercased and...