Learning about evasion attacks
There are six broad categories of adversarial attacks, detailed as follows:
- Evasion: This means designing an input that can cause a model to incorrectly predict, especially when it wouldn't fool a human observer. It can either be targeted or untargeted, depending on an attacker's intention to fool the model into misclassifying one class toward another, or not. The attack methods can be white-box if the attacker has full access to the model and its training dataset, or black-box with only inference access. Gray-box is in the middle; black-box is always model-agnostic; whereas white- and gray-box methods might be.
- Poisoning: Injecting faulty training data or parameters into a model can come in many forms, depending on an attacker's capabilities and access. For instance, for systems with user-generated data, the attacker may be capable of adding faulty data or labels. If they have more access, they could perhaps modify large amounts...