Performing an adversarial attack on images
In the previous section, we learned about generating an image from random noise using a VAE. However, it was an unsupervised exercise. What if we want to modify an image in such a way that the change is so minimal that it is indistinguishable from the original image for a human, but still the neural network model perceives the object as belonging to a different class? Adversarial attacks on images come in handy in such a scenario.
Adversarial attacks refer to the changes that we make to input image values (pixels) so that we meet a certain objective. This is especially helpful in making our models robust so that they are not fooled by minor modifications. In this section, we will learn about modifying an image slightly in such a way that the pre-trained models now predict them as a different class (specified by the user) and not the original class. The strategy we will adopt is as follows:
- Provide an image of an elephant. ...