In this section, we're going to see how the YOLO algorithm works. YOLO stands for you only look once. The name comes from the fact that you need only one execution of the neural network to get all predictions, which is possible because of the use of convolutional sliding windows.
YOLO solves the problem of the bounding box's accuracy. So as we saw in the previous section, we had this image:
With the help of the convolutional sliding window, we were able to detect all the window's predictions with one execution. So, for each of these windows, we can detect whether the selected pixels represent a car.
Now the problem is that even if we can do that, this window is kind of steady, making it incapable of representing a good bounding box. Observe the image carefully to notice that none of the cars is in a good bounding box.
...