We have learned that object detection gives us the output where a bounding box surrounds the object of interest in an image. For us to build an algorithm that detects the bounding box surrounding the object in an image, we would have to create the input–output mapping, where the input is the image and the output is the bounding boxes surrounding the objects in the given image.
Note that when we detect the bounding box, we are detecting the pixel locations of the top-left corner of the bounding box surrounding the image, and the corresponding width and height of the bounding box.
To train a model that provides the bounding box, we need the image, and also the corresponding bounding-box coordinates of all the objects in an image.
In this section, we will highlight one of the ways to create the training dataset where the image shall...