Region-based localization networks
Historically, the basic approach in object localization was to use a classification network in a sliding window; it consists of sliding a window one pixel by one pixel in each direction and applying a classifier at each position and each scale in the image. The classifier learns to say if the object is present and centered. It requires a large amount of computations since the model has to be evaluated at every position and scale.
To accelerate such a process, the Region Proposal Network (RPN) in the Fast-R-CNN paper from the researcher Ross Girshick consists of transforming the fully connected layers of a neural net classifier such as MNIST CNN into convolutional layers as well; in fact, network dense on 28x28 image, there is no difference between a convolution and a linear layer when the convolution kernel has the same dimensions as the input. So, any fully connected layers can be rewritten as convolutional layers, with the same weights and the appropriate...