Understanding post hoc explainability
Post hoc explainability refers to applying explainability techniques after model training. Generally, post hoc explainability methods approximate model behavior by correlating features and predictions. Hence, assessing the quality of explanations, such as faithfulness and monotonicity, which we will review in Chapter 9, is crucial when using these methods. This section discusses how to achieve post hoc explainability locally and globally. We will walk through an example of explaining an image classifier using LIME.
Post hoc global explainability
ML models learn by training with a large amount of data to derive knowledge into structure and parameters. Traditional pipelines use feature engineering to transform raw data into features. ML models then map the learned representation to outputs.
Post hoc global explainability generally focuses on feature importance by assessing how model accuracy deviates after permuting the values of a specific...