Why is privacy an issue in ML?
As we discussed in the previous chapters, ML models need large-scale training data to converge and train well. The data can be collected from social media, online transactions, surveys, questionaries, or other sources. Thus, the collected data may contain sensitive information that individuals may not want to share with some organizations or individuals. If the data was shared or accessed by others, and thus the identities of the individuals were identified, that may cause them personal abuse, financial issues, or identity theft.
The complexity of a privacy breach in ML is closely related to the following main three factors:
- ML task
- Dataset size
- Regulations
Let’s take a closer look.
ML task
The task mainly defines the type of training data that we need to collect and annotate. Thus, some ML tasks, such as weather prediction and music generation, may have fewer privacy issues compared to other ML tasks, such as biometric...