Potential for improvement – better accuracy and faster training
At the beginning of Chapter 13, we listed several criteria that can be used to evaluate NLU systems. The one that we usually think of first is accuracy – that is, given a specific input, did the system provide the right answer? Although in a particular application, we eventually may decide to give another criterion priority over accuracy, accuracy is essential.
Better accuracy
As we saw in Chapter 13, even our best-performing system, the large Bidirectional Encoder Representations from Transformers (BERT) model, only achieved an F1 score of 0.85 on the movie review dataset, meaning that 15% of its classifications were incorrect. State-of-the-art LLM-based research systems currently report an accuracy of 0.93 on this dataset, which still means that the system makes many errors (SiYu Ding, Junyuan Shang, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu, and Haifeng Wang. 2021. ERNIE-Doc: A Retrospective Long-Document...