Recently, there has been interest in adversarial training using adversarial neural networks (Abadi, M., et al. (2016)). This is due to adversarial neural networks that can be trained to protect the model itself from AI-based adversaries. We could categorize adversarial learning into two major branches:
- Black box: In this category, a machine learning model exists as a black box, and the adversary can only learn to attack the black box to make it fail. The adversary arbitrarily (within some bounds) creates fake input to make the black box model fail, but it has no access to the model it is attacking (Ilyas, A., et al. (2018)).
- Insider: This type of adversarial learning is meant to be part of the training process of the model it aims to attack. The adversary has an influence on the outcome of a model that is trained not to be fooled by such an adversary (Goodfellow, I., et al. (2014)).
There are pros and cons to each of these:
Black box pros |
Black... |