Exploring the Transformers Interpret toolbox
As we already reviewed in the first section of this chapter, there are two major methods: perturbation-based and gradient-based post-hoc explainability tools. SHAP belongs to the perturbation-based family. Now, let's look at a gradient-based toolbox called Transformers Interpret (https://github.com/cdpierse/transformers-interpret). This is a relatively new tool, but it is built on top of a unified model interpretability and understanding library for PyTorch called Captum (https://github.com/pytorch/captum), which provides a unified API to use either perturbation or gradient-based tools (https://arxiv.org/abs/2009.07896). Transformers Interpret further simplifies the API of Captum so that we can quickly explore gradient-based explainability methods to get some hands-on experience.
To get started, first make sure you already have the dl-explain
virtual environment set up and activated, as described in the previous section. Then, we...