As mentioned in the introduction of this chapter, we have already been introduced to the concepts behind object recognition using CNNs. For this case, we used a trained model to perform classification; it achieved this by learning a set of feature maps using convolutional layers that are fed into fully connected (or dense) layers and, finally, their output, through an activation layer which gave us the probability for each of the classes. The class was inferred by selecting the one with the largest probability.
Let's differentiate between object recognition, object localization, and object detection. Object recognition is the task of classifying the most dominant object in a image while object localization performs classification and predicts an object's bounding box. Object detection further extends this and allows...