Understanding adversarial attacks
Adversarial attacks are a type of attack on ML models that exploit their weaknesses and cause them to make incorrect predictions. Imagine you have an ML model that can accurately identify pictures of animals. An adversarial attack might manipulate the input image of an animal in such a way that the model misidentifies it as a different animal.
These attacks work by making small, often imperceptible changes to the input data that the model is processing. These changes are designed to be undetectable by humans but can cause the model to make large errors in its predictions. Adversarial attacks can be used to undermine the performance of ML models in a variety of settings, including image recognition, speech recognition, and natural language processing (NLP). There are two types of adversarial attack objectives: targeted and untargeted. A targeted objective means to make the ML systems predict a specific class determined by the attacker, and an untargeted...