Cross-lingual zero-shot learning
In the previous sections, you learned how to perform zero-shot text classification using monolingual models. Using XLM-R for multilingual and cross-lingual zero-shot classification is identical to the approach and code used previously, so we will use mT5 here.
mT5, which is a massively multilingual pretrained language model, is based on the encoder-decoder architecture of Transformers and is also identical to T5. T5 is pretrained on English and mT5 is trained on 101 languages from Multilingual Common Crawl (mC4).
The fine-tuned version of mT5 on the XNLI dataset is available from the Hugging Face repository (https://huggingface.co/alan-turing-institute/mt5-large-finetuned-mnli-xtreme-xnli).
The T5 model and its variant, mT5, are completely text-to-text models, which means they will produce text for any task given, even if the task is classification or NLI. So, in the case of inferring this model, extra steps are required. We’ll take...