Paper in Two minutes: i-RevNet, a deep invertible convolutional network

The ICLR 2018 accepted paper, i-RevNet: Deep Invertible Networks, introduces i-RevNet, an invertible convolutional network, that does not discard any information about the input while classifying images. This paper is authored by Jörn-Henrik Jacobsen, Arnold W.M. Smeulders, and Edouard Oyallon. The 6th annual ICLR conference is scheduled to happen between April 30 - May 03, 2018.

i-RevNet, a deep invertible convolutional network

What problem is the paper attempting to solve?

A CNN is generally composed of a cascade of linear and nonlinear operators. These operators are very effective in classifying images of all sorts but reveal little information about the contribution of the internal representation to the classification. The learning process of a CNN works by a regular reduction of large amounts of uninformative variability in the images to reveal the essence of the visual class.

However, the extent to which information is discarded is lost somewhere in the intermediate nonlinear processing steps. Also, there is a wide belief, that discarding information is essential for learning representations that generalize well to unseen data.

The authors of this paper show that discarding information is not necessary and propose to explain this theory with empirical evidence. This paper also provides an understanding of the variability reduction process by proposing an invertible convolutional network. The i-RevNet does not discard any information about the input while classifying images. It has a built-in pseudo-inverse, allowing for easy inversion. It basically uses linear and invertible operators for performing downsampling, instead of non-invertible variants like spatial pooling.

Paper summary

i-RevNet is an invertible deep network, which builds upon the recently introduced RevNet, where the non-invertible components of the original RevNets are replaced by invertible ones. i-RevNets retain all information about the input signal in any of their intermediate representations up until the last layer. They achieve the same performance on Imagenet compared to similar non-invertible RevNet and ResNet architectures.

The above image describes the blocks of an i-RevNet. The strategy implemented by an i-RevNet consists in an alternation between additions, and nonlinear operators, while progressively down-sampling the signal operators. The pair of the final layer is concatenated through a merging operator. Using this architecture, the authors avoid the non-invertible modules of a RevNet (e.g. max-pooling or strides) which are necessary to train them in a reasonable time and are designed to build invariance w.r.t. Translation variability. Their method replaces the non-invertible modules by linear and invertible modules Sj, that can reduce the spatial resolution while maintaining the layer’s size by increasing the number of channels.

Key Takeaways

This work provides a solid empirical evidence that learning invertible representations does not discard any information about their input on large-scale supervised problems.

i-RevNet, the invertible network proposed, is a class of CNN which is fully invertible and permits to exactly recover the input from its last convolutional layer.

i-RevNets achieve the same classification accuracy in the classification of complex datasets as illustrated on ILSVRC-2012 when compared to the RevNet and ResNet architectures with a similar number of layers.

The inverse network is obtained for free when training an i-RevNet, requiring only minimal adaption to recover inputs from the hidden representations.

Reviewer feedback summary

Overall Score: 25/30
Average Score: 8.3

Reviewers agreed the paper is a strong contribution, despite some comments about the significance of the result; i.e., why is invertibility a "surprising" property for learnability, in the sense that F(x) = {x, phi(x)}, where phi is a standard CNN satisfies both properties: invertible and linear measurements of F producing good classification. Having said that, the reviews agreed that the paper is well written and easy to follow and considered it to be a great contribution to the ICLR conference.