The data-centric imperative
Addressing bias in machine learning necessitates a holistic approach, with data-centric strategies complementing model-centric techniques. Data-centricity involves taking proactive steps to curate, clean, and enhance the dataset itself, thus minimizing the bias that models can inherit. By embracing data-centric practices, organizations can foster fairness, accountability, and ethical AI.
In the remainder of this chapter, we will explore a spectrum of data-centric strategies that empower machine learning practitioners to reduce bias. These include data resampling, augmentation, cleansing, feature selection, and more. Real-world examples will illustrate the tangible impact of these strategies in the domains of finance, human resources, and healthcare.
If data is fairly and accurately captured or created, then it is quite likely algorithms will be mostly free from bias. However, the techniques we will cover in this chapter are post-data creation, where...