YOLO's final output is a w × h × M matrix, where w × h is the size of the grid, and M corresponds to the formula B × (C + 5), where the following applies:
B is the number of bounding boxes per grid cell.
C is the number of classes (in our example, we will use 20 classes).
Notice that we add 5 to the number of classes. This is because, for each bounding box, we need to predict (C + 5) numbers:
tx and ty will be used to compute the coordinates of the center of the bounding box.
tw and th will be used to compute the width and height of the bounding box.
c is the confidence that an object is in the bounding box.
p1, p2, ..., and pC are the probability that the bounding box contains an object of class 1, 2, ..., C (where C = 20 in our example).
This diagram summarizes how the output matrix appears: