Working with benchmarks and datasets
Before introducing the datasets
library, we'd better talk about important benchmarks such as General Language Understanding Evalution (GLUE), Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME), and Stanford Question Answering Dataset (SquAD). Benchmarking is especially critical for transferring learnings within multitask and multilingual environments. In NLP, we mostly focus on a particular metric that is a performance score on a certain task or dataset. Thanks to the Transformer
library, we are able to transfer what we have learned from a particular task to a related task, which is called Transfer Learning (TL). By transferring representations between related problems, we are able to train general-purpose models that share common linguistic knowledge across tasks, also known as Multi-Task Learning (MTL). Another aspect of TL is to transfer knowledge across natural languages (multilingual models).