In the previous sections, we learned how M-BERT works. We learned how M-BERT is used in many different languages. However, instead of having a single M-BERT model for many languages, can we train a monolingual BERT for a specific target language? We can, and that is precisely what we will learn in this section. We will look into several interesting and popular monolingual BERT models for various languages, as indicated here:
- FlauBERT for French
- BETO for Spanish
- BERTje for Dutch
- German BERT
- Chinese BERT
- Japanese BERT
- FinBERT for Finnish
- UmBERTo for Italian
- BERTimbay for Portuguese
- RuBERT for Russian
FlauBERT for French
FlauBERT, which stands for French Language Understanding via BERT, is a pre-trained BERT model for the French language. The FlauBERT model performs better than the multilingual and cross-lingual models on many downstream French NLP tasks.
FlauBERT is trained on a huge heterogeneous French corpus. The French corpus consists of 24 sub-corpora containing...