Source: ERNIE 2.0 research paper
Based on this idea, Baidu has proposed a continual pre-training framework for language understanding in which pre-training tasks can be incrementally built and learned through multi-task learning in a continual way. According to Baidu, in this framework, different customized tasks can be incrementally introduced at any time and these tasks are trained through multi-task learning, which enables the encoding of lexical, syntactic and semantic information across tasks. And whenever a new task arrives, this framework can incrementally train the distributed representations without forgetting the previously trained parameters.
The Structure of Released ERNIE 2.0 Model
Source: ERNIE 2.0 research paper
ERNIE is a continual pre-training framework which provides a feasible scheme for developers to build their own NLP models. The fine-tuning source codes of ERNIE 2.0 and pre-trained English version models can be downloaded from the GitHub page.
The team at Baidu compared the performance of ERNIE 2.0 model with the existing pre-training models on the English dataset GLUE and 9 popular Chinese datasets separately. The results show that ERNIE 2.0 model outperforms BERT and XLNet on 7 GLUE language understanding tasks and outperforms BERT on all of the 9 Chinese NLP tasks, such as DuReader Machine Reading Comprehension, Sentiment Analysis and Question Answering.
Specifically, according to the experimental results on GLUE datasets, ERNIE 2.0 model almost comprehensively outperforms BERT and XLNET on English tasks, whether it is a base model or the large model. Furthermore, the research paper shows that ERNIE 2.0 large model achieves the best performance and creates new results on the Chinese NLP tasks.
Source: ERNIE 2.0 research paper
To know more about ERNIE 2.0, read the research paper and check out their official blog on Baidu’s website.
DeepMind’s AI uses reinforcement learning to defeat humans in multiplayer games
CMU and Google researchers present XLNet: a new pre-training method for language modeling that outperforms BERT on 20 tasks
Transformer-XL: A Google architecture with 80% longer dependency than RNNs