Chapter 7: Anchor and Counterfactual Explanations
In previous chapters, we have learned how to attribute model decisions to features and their interactions with state-of-the-art global and local model interpretation methods. However, the decision boundaries are not always easy to define nor interpret with these methods. Wouldn't it be nice to be able to derive human-interpretable rules from model interpretation methods? In this chapter, we will cover a few human-interpretable, local, classification-only model interpretation methods. We will first learn how to use scoped rules called anchors to explain complex models with statements such as if X conditions are met, then Y is the outcome. Then, we will explore counterfactual explanations that follow the form if Z conditions aren't met, then Y is not the outcome. Lastly, we will explain how contrastive explanations combine both anchors and counterfactuals to something such as Y is the outcome if X conditions are met and Z conditions...