Detecting bias in ML
For this chapter, I'd like to use an ML adult census income dataset from the University of California Irvine (UCI) ML repository (https://archive.ics.uci.edu/ml/datasets/adult). This dataset contains demographic information from census data and income level as a prediction target. The goal of the dataset is to predict whether a person earns over or below United States dollars (USD) $50,000 ($50K) per year based on the census information. This is a great example and is the type of ML use case that includes socially sensitive categories such as gender and race, and is under the most scrutiny and regulation to ensure fairness when producing an ML model.
In this section, we will analyze the dataset to detect data bias in the training data, mitigate if there is any bias, train an ML model, and analyze whether there is any model bias against a particular group.
Detecting pretraining bias
Please open the notebook in Getting-Started-with-Amazon-SageMaker...