A localization network
In Spatial Transformer Networks (STN), instead of applying the network directly to the input image signal, the idea is to add a module to preprocess the image and crop it, rotate it, and scale it to fit the object, to assist in classification:
For that purpose, STNs use a localization network to predict the affine transformation parameters and process the input:
In Theano, differentiation through the affine transformation is automatic, we simply have to connect the localization net with the input of the classification net through the affine transformation.
First, we create a localization network not very far from the MNIST CNN model, to predict six parameters of the affine transformation:
l_in = lasagne.layers.InputLayer((None, dim, dim)) l_dim = lasagne.layers.DimshuffleLayer(l_in, (0, 'x', 1, 2)) l_pool0_loc = lasagne.layers.MaxPool2DLayer(l_dim, pool_size=(2, 2)) l_dense_loc = mnist_cnn.model(l_pool0_loc, input_dim...