Bias detection and explainability with Data Wrangler and Clarify
Now that we've done some initial work in exploring and preparing our data, let's do a sanity check on our input data. While bias can mean many things, one particular symptom is a dataset that has many more samples of one type of data than another, which will affect our model's performance. We'll use Data Wrangler to see if our input data is imbalanced and understand which features are most important to our model.
To begin, add an analysis to the flow. Choose Bias Report from the list of available transformations and use the mobile
column as the label, with 1
as the predicted value. Choose city
as the column to use for bias analysis, then click Check for bias. In this scenario, we want to determine whether our dataset is somehow imbalanced with respect to the city and whether the data was collected at a mobile station. If the quality of data from mobile sources is inferior to non-mobile sources,...