Evaluating misclassifications with gradient-based attribution methods
Gradient-based methods calculate attribution maps for each classification with both forward and background passes through the CNN. As the name suggests, these methods leverage the gradients in the backward pass to compute the attribution maps. All of these methods are local interpretation methods because they only derive a single interpretation per sample. Incidentally, attributions in this context means that we are attributing the predicted labels to areas of an image. They are often called sensitivity maps in academic literature, too.
To get started, we will first need to create an array with all of our misclassification samples (X_misclass
) from the validation dataset (X_val
). Many of these methods can compute the attribution maps in batch, so this facilitates the process. We can then print the shape of our misclassifications array to ensure that all 17
samples are there:
idxs = avocado_FN_idxs + grapefruit_FP_idxs...