Although strided convolutions are often used in CNN architectures, average-pooling and max-pooling are the most common operations when it comes to reducing the spatial dimensions of images. Therefore, Zeiler and Fergus also proposed a max-unpooling operation (often simply referred to as unpooling) to pseudo-reverse max-pooling. They used this operation within a network they called a deconvnet, to decode and visualize the features of their convnet (that is, a CNN). In the paper describing their solution after winning ILSVRC 2013 (in Visualizing and understanding convolutional networks, Springer, 2014), they explain that, even though max-pooling is not invertible (that is, we cannot mathematically recover all the non-maximum values the operation discards), it is possible to define an operation approximating its inversion, at least in terms of spatial sampling.
To implement this pseudo-inverse operation, they first modified each max-pooling layer so that it outputs...