We mentioned earlier that true and false positives were defined by the number of predictions matching or not matching the ground truth boxes. However, how do you decide when a prediction and the ground truth are matching? A common metric is the Jaccard index, which measures how well two sets overlap (in our case, the sets of pixels represented by the boxes). Also known as Intersection over Union (IoU), it is defined as follows:
|𝐴| and |𝐵| are the cardinality of each set; that is, the number of elements they each contain. 𝐴 ⋂ 𝐵 is the intersection of the two sets, and therefore the numerator |𝐴 ⋂ 𝐵| represents the number of elements they have in common. Similarly, 𝐴 ⋃ 𝐵 is the union of the sets (as seen in the following diagram), and therefore the denominator |𝐴 ⋃ 𝐵| represents the total number of elements the two sets cover together: