In practice, YOLOv2 computes each final bounding box's coordinates using the following formulas:
The terms of the preceding equation can be explained as follows:
- tx , ty , tw , and th are the outputs from the last layer.
- bx , by , bw , and bh are the position and size of the predicted bounding box, respectively.
- pw and ph represent the original size of the anchor box.
- cx and cy are the coordinates of the current grid cell (they will be (0,0) for the top-left box, (w - 1,0) for the top-right box, and (0, h - 1) for the bottom-left box).
- exp is the exponential function.
- sigmoid is the sigmoid function, described in Chapter 1, Computer Vision and Neural Networks.
While this formula may seem complex, this diagram may help to clarify matters:
Figure 5.7: How YOLO refines and positions anchor boxes
In the preceding diagram, we see that on the left, the solid line is the anchor box, and the dotted line is the refined bounding...