Advances in application of deep learning in computer vision are often highly focalized on the kind of classification problems that can be summarized by challenges such as ImageNet (but also, for instance, PASCAL VOC - http://host.robots.ox.ac.uk/pascal/VOC/voc2012/) and the ConvNets suitable to crack it (Xception, VGG16, VGG19, ResNet50, InceptionV3, and MobileNet, just to quote the ones available in the well-known package Keras: https://keras.io/applications/).
Though deep learning networks based on ImageNet data are the actual state of the art, such networks can experience difficulties when faced with real-world applications. In fact, in practical applications, we have to process images that are quite different from the examples provided by ImageNet. In ImageNet the elements to be classified are clearly the only clear element present...