Object detection is a supervised learning problem that requires a considerable amount of data to reach good performance. The process of carefully annotating images by drawing bounding boxes around the objects and assigning them the correct labels is a time-consuming process that requires several hours of repetitive work.
Fortunately, there are already several datasets for object detection that are ready to use. The most famous is the ImageNet dataset, immediately followed by the PASCAL VOC 2007 dataset. To be able to use ImageNet, dedicated hardware is required since its size and number of labeled objects per image makes the object detection task hard to tackle.
PASCAL VOC 2007, instead, consists of only 9,963 images in total, each of them with a different number of labeled objects belonging to the 20 selected object classes. The twenty object classes are as follows...