Universal Adversarial Perturbations (UAPs)
UAPs pose a unique and significant threat to ML models. Unlike traditional adversarial examples crafted for a specific model, UAPs are input perturbations effective across a wide range of models. They exploit the shared vulnerabilities inherent in the feature space of different models, allowing attackers to create a single perturbation that can cause misclassification on any model trained to perform the same task.
Attack scenario
Consider the case where multiple image recognition systems, each with a different architecture (e.g., VGG19, ResNet50, and InceptionV3), are deployed in security-sensitive environments. An attacker creates a UAP that causes all three models to misclassify the input when added to any image. The perturbation is universal in that it does not target the idiosyncrasies of a single model but rather the commonalities in the decision boundaries of all models.
Attack example
The following Python code demonstrates...