Creating a bounding-box ground truth for training
We have learned that object detection gives us output in the form of a bounding box surrounding the object of interest in an image. For us to build an algorithm that detects this bounding box, we would have to create input-output combinations, where the input is the image and the output is the bounding boxes and the object classes.
Note that when we detect the bounding box, we are detecting the pixel locations of the four corners of the bounding box surrounding the image.
To train a model that provides the bounding box, we need the image and the corresponding bounding-box coordinates of all the objects in the image. In this section, we will learn one way to create the training dataset, where the image is the input and the corresponding bounding boxes and classes of objects are stored in an XML file as output.
Here, we will install and use ybat
to create (annotate) bounding boxes around objects in the image....