Membership inference attacks
In membership inference attacks the attackers try to ascertain whether a data item is part of a model’s training dataset or whether it is a black box attack and the attackers infer the membership by analyzing the model’s outputs for the specific data item. By targeting individual records, these attacks can reveal sensitive information about the individuals whose data was used for training, such as their health status, preferences, or behavior. This can lead to violating the privacy regulations and policies of the data or ML service providers.
An adversary can perform the attacks with black box access to the model’s predictions or white box access to the model’s parameters or gradients. The general strategy of these attacks is to exploit the overfitting or memorization behavior of the model, which causes the model to perform better on the training data than on the unseen data. For example, an attacker can use the model’...