Understanding inference attacks
Inference attacks are privacy attacks that aim to deduce sensitive or confidential information from an ML model or its outputs without directly accessing the model’s parameters or training data. Inference attacks differ from model extraction or inversion attacks, which try to recover artifacts used in model training; by contrast, they attempt to deduce broader data, thus posing significant risks and consequences to securing privacy.
Inference attacks can be classified into two main categories:
- Attribute or property inference attacks: These attacks aim to infer global information – such as distribution – or individual properties of the training data or the model. For example, an attacker might try to determine the average age of individuals in a dataset used to train a health prediction model or infer the architecture of a deep learning model used for image recognition.
- Membership inference attacks: These attacks are...