Statistical significance of differences
The last general topic we will cover in evaluation is the topic of determining whether the differences between the results of experiments we have done reflect a real difference between the experimental conditions, or whether they reflect differences that are due to chance. This is called statistical significance. Whether a difference in the values of the metrics represents a real difference between systems isn’t something that we can know for certain, but what we can know is how likely it is that a difference that we’re interested in is due to chance. Let’s suppose we have the situation with our data that’s shown in Figure 13.3:

Figure 13.3 – Two distributions of measurement values – do they reflect a real difference between the things they’re measuring?
Figure 13.3 shows two sets of measurements, one with a mean of 0, on the left, and one with a mean of 0.75, on the...