Overcoming proxy bias
There are times that you can introduce bias even if you don't have any features or data points that directly link to a protected class. Remember that a protected class is something such as age, sex, and religion. This is introduced by proxy. And this boils down to data being present that strongly correlates with someone being in that group due to data in some ways bleeding into that proxy dataset.
In the next diagram, you can see a representation of how proxy bias can leak into data. On the left, you have perfectly valid X and Y data, but there is also data B, which is in the form of protected class data. Even though the data from B isn't directly used in the training dataset, it is brought in via proxy through the X dataset:
Let's look at some examples of what proxy bias could look like to make this a bit more concrete.
Examples of proxy bias
The following list contains some examples...