Human-grounded evaluation framework
Explanations are practical and helpful when they enable the target audience to build a mental representation of model behavior and grasp the inferential process. The target audience encompasses end users without domain knowledge and expert users who can provide informed feedback.
Measuring human simulatability is essential to evaluate the extent of a person’s understanding of an ML model behavior. There are two types of human simulatability:
- Forward simulation: A human predicts a model’s output based on a given input. For example, ask a user to estimate house prices given a specific zip code.
- Counterfactual simulation: Given an input and output, a human predicts a model’s output or makes a causal judgment if the input is different. For example, ask a user to predict if they will miss a flight if they arrive 20 minutes earlier at the airport.
Simulating model prediction from end users provides insights into...