Understanding classifications with perturbation-based attribution methods
Perturbation-based methods have already been covered to a great extent in this book so far. So many of the methods we have covered, including SHAP, LIME, anchors, and even permutation feature importance, employ perturbation-based strategies. The intuition behind them is that if you remove, alter, or mask features in your input data and then make predictions with them, you’ll be able to attribute the difference between the new predictions and the original predictions to the changes you made in the input. These strategies can be leveraged in both global and local interpretation methods.
We will now do the same as we did with the misclassification samples, but to the chosen true positives, and gather four of each class in a single tensor (X_correctcls
):
correctcls_idxs = wglass_TP_idxs[:4] + battery_TP_idxs[:4]
correctcls_data = torch.utils.data.Subset(test_data, correctcls_idxs)
correctcls_loader...