Learning about evasion attacks
There are six broad categories of adversarial attacks:
- Evasion: designing an input that can cause a model to make an incorrect prediction, especially when it wouldn’t fool a human observer. It can either be targeted or untargeted, depending on the attacker’s intention to fool the model into misclassifying a specific class (targeted) or, rather, misclassifying any class (untargeted). The attack methods can be white-box if the attacker has full access to the model and its training dataset, or black-box with only inference access. Gray-box sits in the middle. Black-box is always model-agnostic, whereas white and gray-box methods might be.
- Poisoning: injecting faulty training data or parameters into a model can come in many forms, depending on the attacker’s capabilities and access. For instance, for systems with user-generated data, the attacker may be capable of adding faulty data or labels. If they have more access...