In this section, we'll learn to solve the object detection problem of detecting multiple object positions using the sliding window technique. At the end of this section, we'll explore the downsides of using this method.
The very first step is to train a convolutional neural network (CNN)as a VGG-16 or maybe a more advanced architecture, such as the inception network, or even the residual network. We'll use cropped images as the training data, instead of the original size of the images. The cropped images will contain only the object in question.
In this example, we'll use the car detection problem and the images will contain cars. When it comes to practical implementation, there are two things that we need to keep in mind:
- It would help if we used different crop sizes for different images.
- Add images other...