Appreciating what hinders machine learning interpretability
In the last section, we were wondering why the chart with ap_hi
versus weight
didn't have a conclusive pattern. It could very well be that although weight
is a risk factor, there are other critical mediating variables that could explain the increased risk of CVD. A mediating variable is one that influences the strength between the independent and target (dependent) variable. We probably don't have to think too hard to find what is missing. In Chapter 1, Interpretation, Interpretability, and Explainability; and Why Does It All Matter?, we performed linear regression on weight
and height
because there's a linear relationship between these variables. In the context of human health, weight
is not nearly as meaningful without height
, so you need to look at both.
Perhaps if we plot the decision regions for these two variables, we will get some clues. We can plot them with the following code:
fig, ax = plt.subplots...