Why evaluate an NLU system?
There are many questions that we can ask about the overall quality of an NLU system, and evaluating it is the way that we answer these questions. How we evaluate depends on the goal of developing the system and what we want to learn about the system to make sure that the goal is achieved.
Different kinds of developers will have different goals. For example, consider the goals of the following types of developers:
- I am a researcher, and I want to learn whether my ideas advance the science of NLU. Another way to put this is to ask how my work compares to the state of the art (SOTA) – that is, the best results that anyone has reported on a particular task.
- I am a developer, and I want to make sure that my overall system performance is good enough for an application.
- I am a developer, and I want to see how much my changes improve a system.
- I am a developer, and I want to make sure my changes have not decreased a system’s...