Working with benchmarks and datasets
Thanks to the transformer architecture and transformer library, we are able to transfer what we have learned from a particular task to any other task, which is called transfer learning (TL). By transferring representations between related problems, we are able to train general-purpose models that share common linguistic knowledge across tasks. We can apply it since current deep learning approaches allow us to solve many tasks at the same time, also known as multitask learning (MTL), or in order, also known as sequential transfer learning (STL). Benchmarking mechanisms test the extent to which these capabilities are possessed by the LM.
Before introducing the datasets
library, it is worth talking about important benchmarks such as General Language Understanding Evaluation (GLUE), Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME), and Stanford Question Answering Dataset (SquAD). Benchmarking is especially critical for evaluating...