XLM and mBERT
In this section, we will delve into the explanation of two models: mBERT and XLM. The rationale behind choosing these models lies in their representation as the foremost examples of multilingual models; mBERT is a multilingual model trained on a different corpus of various languages using MLM modeling. It can operate separately for many languages. On the other hand, XLM is trained on different corpora using MLM, CLM, and TLM language modeling, and it can solve cross-lingual tasks. For instance, it can measure the similarity of the sentences in two different languages by mapping them in a common vector space, which is not possible with mBERT.
mBERT
You are familiar with the BERT autoencoder model from Chapter 3, and how to train it using MLM on a specified corpus. Imagine a case where a wide and huge corpus is provided not from a single language but from 104 languages instead. Training on such a corpus would result in a multilingual version of BERT. However, training...