Ethics in Data Acquisition and Management
Machine learning (ML) requires a lot of data that can come from a variety of sources, but not all sources are equally easy to use. In software engineering, we can design and develop systems that use data from other systems. We can also use data that does not really originate from people; for example, we can use data about defects or complexity of systems. However, to provide more value to society, we need to use data that contains information about people or their belongings; for example, when we train machines to recognize faces or license plates. Regardless of our use case, however, we need to follow ethical guidelines and, above all, have the guiding principle that our software should not cause any harm.
We start this chapter by exploring a few examples of unethical systems that show bias; for example, credit ranking systems that penalize certain minorities. I will also explain problems with using open source data and revealing the identities...