Defending against targeted attacks with preprocessing
There are five broad categories for adversarial defenses, detailed as follows:
- Preprocessing: Changing a model's inputs so that they are harder to attack.
- Adversarial training: Training a new robust model that is designed to overcome attacks.
- Detection: Detecting attacks—for instance, you can train a model to detect adversarial examples.
- Transformer: Modifying the model architecture and training so that it's more robust—this may include techniques such as distillation, input filters, neuron pruning, and unlearning.
- Postprocessing: Changing model outputs to overcome production-inference or model-extraction attacks.
Only the first four defenses work with evasion attacks, and in this chapter we will only cover the first two: preprocessing and adversarial training. FGSM and C&W can be defended easily with either of these, but AP is tougher to defend against, so it might require...