Chapter 6. Locating with Spatial Transformer Networks
In this chapter, the NLP field is left to come back to images, and get an example of application of recurrent neural networks to images. In Chapter 2, Classifying Handwritten Digits with a Feedforward Network we addressed the case of image classification, consisting of predicting the class of an image. Here, we'll address object localization, a common task in computer vision as well, consisting of predicting the bounding box of an object in the image.
While Chapter 2, Classifying Handwritten Digits with a Feedforward Network solved the classification task with neural nets built with linear layers, convolutions, and non-linarites, the spatial transformer is a new module built on very specific equations dedicated to the localization task.
In order to locate multiple objects in the image, spatial transformers are composed with recurrent networks. This chapter takes the opportunity to show how to use prebuilt recurrent networks...