Detecting bias and explaining model predictions for healthcare coverage amounts
Bias in ML models built to predict critical healthcare metrics can erode trust in this technology and prevent large-scale adoption. In this exercise, we will start with some sample data about healthcare coverage and expenses for about 1,000 patients belonging to different demographics. We will then train a model to predict how much the coverage-to-expense ratio is for patients in different demographics. Following that, we will use SageMaker Clarify to generate bias metrics on our training data and trained model. We will also generate explanations of our prediction to understand why the model is predicting the way it is. Let’s begin by acquiring the dataset for this exercise.
Acquiring the dataset
The dataset used in this exercise is synthetically generated using Synthea, an open source synthetic patient generator. To learn more about Synthea, you can visit the following link: https://synthetichealth...