Cross-Lingual and Multilingual Language Modeling
Up to this point, you have learned a lot about transformer-based architectures, from encoder-only models to decoder-only models and from efficient transformers to long-context transformers. You also learned about semantic text representation based on a Siamese network. However, we discussed all these models in terms of monolingual problems. We assumed that these models just understand a single language and are not capable of having a general understanding of text, regardless of the language itself. In fact, some of these models have multilingual variants: multilingual bidirectional encoder representations from transformers (mBERT), multilingual text-to-text transfer transformer (mT5), and multilingual bidirectional and auto-regressive transformer (mBART), to name but a few. On the other hand, some models are specifically designed for multilingual purposes and are trained with cross-lingual objectives. For example, cross-lingual language...