A CNN is generally composed of a cascade of linear and nonlinear operators. These operators are very effective in classifying images of all sorts but reveal little information about the contribution of the internal representation to the classification. The learning process of a CNN works by a regular reduction of large amounts of uninformative variability in the images to reveal the essence of the visual class.
However, the extent to which information is discarded is lost somewhere in the intermediate nonlinear processing steps. Also, there is a wide belief, that discarding information is essential for learning representations that generalize well to unseen data.
The authors of this paper show that discarding information is not necessary and propose to explain this theory with empirical evidence. This paper also provides an understanding of the variability reduction process by proposing an invertible convolutional network. The i-RevNet does not discard any information about the input while classifying images. It has a built-in pseudo-inverse, allowing for easy inversion. It basically uses linear and invertible operators for performing downsampling, instead of non-invertible variants like spatial pooling.
i-RevNet is an invertible deep network, which builds upon the recently introduced RevNet, where the non-invertible components of the original RevNets are replaced by invertible ones. i-RevNets retain all information about the input signal in any of their intermediate representations up until the last layer. They achieve the same performance on Imagenet compared to similar non-invertible RevNet and ResNet architectures.
The above image describes the blocks of an i-RevNet. The strategy implemented by an i-RevNet consists in an alternation between additions, and nonlinear operators, while progressively down-sampling the signal operators. The pair of the final layer is concatenated through a merging operator. Using this architecture, the authors avoid the non-invertible modules of a RevNet (e.g. max-pooling or strides) which are necessary to train them in a reasonable time and are designed to build invariance w.r.t. Translation variability. Their method replaces the non-invertible modules by linear and invertible modules Sj, that can reduce the spatial resolution while maintaining the layer’s size by increasing the number of channels.
Overall Score: 25/30
Average Score: 8.3
Reviewers agreed the paper is a strong contribution, despite some comments about the significance of the result; i.e., why is invertibility a "surprising" property for learnability, in the sense that F(x) = {x, phi(x)}, where phi is a standard CNN satisfies both properties: invertible and linear measurements of F producing good classification. Having said that, the reviews agreed that the paper is well written and easy to follow and considered it to be a great contribution to the ICLR conference.