Paper in Two minutes: A novel method for resource efficient image classification

This ICLR 2018 accepted paper, Multi-Scale Dense Networks for Resource Efficient Image Classification, introduces a new model to perform image classification with limited computational resources at test time. This paper is authored by Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Weinberger. The 6th annual ICLR conference is scheduled to happen between April 30 - May 03, 2018.

Using a multi-scale convolutional neural network for resource efficient image classification

What problem is the paper attempting to solve?

Recent years have witnessed a surge in demand for applications of visual object recognition, for instance, in self-driving cars and content-based image search. This demand is because of the astonishing progress of convolutional networks (CNNs) where state-of-the-art models may have even surpassed human-level performance. However, most are complex models which have high computational demands at inference time. In real-world applications, computation is never free; it directly translates into power consumption, which should be minimized for environmental and economic reasons.

Ideally, all systems should automatically use small networks when test images are easy or computational resources are limited and use big networks when test images are hard or computation is abundant.

In order to develop resource-efficient image recognition, the authors aim to develop CNNs that slice the computation and process these slices one-by-one, stopping the evaluation once the CPU time is depleted or the classification sufficiently certain. Unfortunately, CNNs learn the data representation and the classifier jointly, which leads to two problems

The features in the last layer are extracted directly to be used by the classifier, whereas earlier features are not.
The features in different layers of the network may have a different scale. Typically, the first layers of deep nets operate on a fine scale (to extract low-level features), whereas later layers transition to coarse scales that allow global context to enter the classifier.

The authors propose a novel network architecture that addresses both problems through careful design changes, allowing for resource-efficient image classification.

Paper summary

The model is based on a multi-scale convolutional neural network similar to the neural fabric, but with dense connections and with a classifier at each layer. This novel network architecture, called Multi-Scale DenseNet (MSDNet), address both of the problems described above (of classifiers altering the internal representation and the lack of coarse-scale features in early layers) for resource-efficient image classification.

The network uses a cascade of intermediate classifiers throughout the network. The first problem is addressed through the introduction of dense connectivity. By connecting all layers to all classifiers, features are no longer dominated by the most imminent early exit and the trade-off between early or later classification can be performed elegantly as part of the loss function.

The second problem is addressed by adopting a multi-scale network structure. At each layer, features of all scales (fine-to-coarse) are produced, which facilitates good classification early on but also extracts low-level features that only become useful after several more layers of processing.

Key Takeaways

MSDNet, is a novel convolutional network architecture optimized to incorporate CPU budgets at test-time.
The design is based on two high-level design principles, to generate and maintain coarse level features throughout the network and to interconnect the layers with dense connectivity.
The final network design is a two-dimensional array of horizontal and vertical layers, which decouples depth and feature coarseness.
Whereas in traditional convolutional networks features only become coarser with increasing depth, the MSDNet generates features of all resolutions from the first layer on and maintains them throughout.
Through experiments, the authors show that their network outperforms all competitive baselines on an impressive range of budgets ranging from highly limited CPU constraints to almost unconstrained settings.

Reviewer feedback summary

Overall Score: 25/30
Average Score: 8.33

The reviewers found the approach to be natural and effective with good results. They found the presentation to be clear and easy to follow. The structure of the network was clearly justified. The reviewers found the use of dense connectivity to avoid the loss of performance of using early-exit classifier interesting. They appreciated the results and found them to be quite promising, with 5x speed-ups and same or better accuracy than previous models. However, some reviewers pointed out that the results about the more efficient densenet* could be shown in the main paper.