Defending against targeted attacks with preprocessing
There are five broad categories of adversarial defenses:
- Preprocessing: changing the model’s inputs so that they are harder to attack.
- Training: training a new robust model that is designed to overcome attacks.
- Detection: detecting attacks. For instance, you can train a model to detect adversarial examples.
- Transformer: modifying model architecture and training so that it’s more robust – this may include techniques such as distillation, input filters, neuron pruning, and unlearning.
- Postprocessing: changing model outputs to overcome production inference or model extraction attacks.
Only the first four defenses work with evasion attacks, and in this chapter, we will only cover the first two: preprocessing and adversarial training. FGSM and C&W can be defended easily with either of these, but an AP is tougher to defend against, so it might require a stronger detection...