Attribute inference attacks
Also known as property inference attacks, attribute inference attacks target some global or individual properties of the training data or the model, such as the data distribution, the data labels, the model architecture, or the model hyperparameters, but not about a specific individual. An adversary can launch these attacks with white box access to the model’s parameters or gradients or black box access to the model’s inference endpoint. There are different approaches involved.
Meta-classifiers
The general strategy of these attacks is to train an attack model – called a meta-classifier or a meta-regressor – that the attacker uses to infer from the target model a confidential property from the observed information and determine whether the target has a specific property. The attacker can train shadow models to help develop the meta-classifier. This will typically follow this workflow:
- Shadow model training: The attacker...