Using rank-N accuracy to evaluate performance
Most of the time, when we're training deep learning-based image classifiers, we care about the accuracy, which is a binary measure of a model's performance, based on a one-on-one comparison between its predictions and the ground-truth labels. When the model says there's a leopard in a photo, is there actually a leopard there? In other words, we measure how precise the model is.
However, for more complex datasets, this way of assessing a network's learning might be counterproductive and even unfair, because it's too restrictive. What if the model didn't classify the feline in the picture as a leopard but as a tiger? Moreover, what if the second most probable class was, indeed, a leopard? This means the model has some more learning to do, but it's getting there! That's valuable!
This is the reasoning behind rank-N accuracy, a more lenient and fairer way of measuring a predictive model's...