Sources of algorithmic bias
ML models, grounded in the learnings from past data, may unintentionally propagate bias present in their training datasets. Recognizing the roots of this bias is a vital first step toward fairer models:
- One such source is historical bias. This form of bias mirrors existing prejudices and systemic inequalities present in society. An example would be a recruitment model trained on a company’s past hiring data. If the organization historically favored a specific group for certain roles, the model could replicate these biases, continuing the cycle of bias.
- Representation or sample bias is another significant contributor. It occurs when certain groups are over- or underrepresented in the training data. For instance, training a facial recognition model predominantly on images of light-skinned individuals may cause the model to perform poorly when identifying faces with darker skin tones, favoring one group over the other.
- Proxy bias is when...