YOLO's final output is a w × h × M matrix, where w × h is the size of the grid, and M corresponds to the formula B × (C + 5), where the following applies:
-
B is the number of bounding boxes per grid cell.
-
C is the number of classes (in our example, we will use 20 classes).
Notice that we add 5 to the number of classes. This is because, for each bounding box, we need to predict (C + 5) numbers:
-
tx and ty will be used to compute the coordinates of the center of the bounding box.
-
tw and th will be used to compute the width and height of the bounding box.
-
c is the confidence that an object is in the bounding box.
-
p1, p2, ..., and pC are the probability that the bounding box contains an object of class 1, 2, ..., C (where C = 20 in our example).
This diagram summarizes how the output matrix appears: