Evaluating misclassifications with gradient-based attribution methods
Gradient-based methods calculate attribution maps for each classification with both forward and background passes through the CNN. As the name suggests, these methods leverage the gradients in the backward pass to compute the attribution maps. All of these methods are local interpretation methods because they only derive a single interpretation per sample. Incidentally, attributions in this context mean that we are attributing the predicted labels to areas of an image. They are often called sensitivity maps in academic literature, too.
To get started, we will first need to create an array with all of our misclassification samples (X_misclass
) from the test dataset (test_data
) using the combined indexes for all of our misclassifications of interest (misclass_idxs
). Since there aren’t that many misclassifications, we are loading a single batch of them (next
):
misclass_idxs = metal_FP_idxs + plastic_FN_idxs...