Evaluation metrics
There are two important concepts that we should keep in mind when selecting an evaluation metric for NLP systems or, more generally, any system that we want to evaluate:
- Validity: The first is validity, which means that the metric corresponds to what we think of intuitively as the actual property we want to know about. For example, we wouldn’t want to pick the length of a text as a measurement for its positive or negative sentiment because the length of a text would not be a valid measure of its sentiment.
- Reliability: The other important concept is reliability, which means that if we measure the same thing repeatedly, we always get the same result.
In the next sections, we will look at some of the most commonly used metrics in NLU that are considered to be both valid and reliable.
Accuracy and error rate
In Chapter 9, we defined accuracy as the number of correct system responses divided by the overall number of inputs. Similarly,...