Features engineering for the baseline model
In order to build the baseline model, we will use the Caffe implementation of the Google MobileNet SSD detection network with pre-trained weights. This model has been trained on the PASCAL VOC dataset. So, in this section, we will look at the approach with which this model has been trained by Google. We will understand the basic approach behind MobileNet SSD and use the pre-trained model to help save time. To create this kind of accurate model, we need to have lots of GPUs and training time, so we are using a pre-trained model. This pre-trained MobileNet model uses Convolution Neural Net (CNN).
Let's look at how the features have been extracted by the MobileNet using CNN. This will help us understand the basic idea behind CNN, as well as how MobileNet has been used. The CNN network is made of layers and, when we provide the images to CNN, it scans the region of the images and tries to extract the possible objects using the region proposal method...