Data privacy
Data privacy is a big part of data science ethics. We may be dealing with healthcare, financial, or other personal data of people. Although we are looking at a screen with numbers on it, it's important to remember that these numbers represent people. A powerful example is the Titanic dataset often used for ML learning materials. This dataset has data on the passengers of the Titanic and is usually used as a classification exercise. The goal of the classification is to classify the survival of a Titanic passenger. When working with the data, it's easy to get lost in the numbers and details of executing the ML algorithm, but we should remember that each datapoint is a person who lived a life just like you or me.
Data privacy can be compromised in many ways:
- Data leaks (for example, being hacked or having data stolen)
- Combining anonymized data from multiple sources to deanonymize people
- Extracting information from ML algorithms ...