So far, we have limited ourselves to recognizing the single most dominant object within an image using a convolutional neural network (CNN). We have seen how a model can be trained to take in a image and extract a series of feature maps that are then fed into a fully connected layer to output a probability distribution of a set of classes. This is then interpreted to classify the object within the image, as shown here:
In this chapter, we will build on this and explore how we can detect and locate multiple objects within a single image. We will start by building up our understanding of how this works and then walk through implementing a image search for a photo gallery application. This application allows the user to filter and sort images not only based on what objects are present in the image, but also on their position relative...