Training to be fair
There are multiple ways to train models to be fairer. A simple approach could be using the different fairness measures that we have listed in the previous section as an additional loss. However, in practice, this approach has turned out to have several issues, such as having poor performance on the actual classification task.
An alternative approach is to use an adversarial network. Back in 2016, Louppe, Kagan, and Cranmer published the paper Learning to Pivot with Adversarial Networks, available at https://arxiv.org/abs/1611.01046. This paper showed how to use an adversarial network to train a classifier to ignore a nuisance parameter, such as a sensitive feature.
In this example, we will train a classifier to predict whether an adult makes over $50,000 in annual income. The challenge here is to make our classifier unbiased from the influences of race and gender, with it only focusing on features that we can discriminate on, including their occupation and the gains they...