NLP evasion attacks with BERT using TextAttack
While initially the focus of evasion attacks was on image classification tasks, its underlying principles can be adapted for NLP use. TextAttack is a popular Python framework to generate adversarial text inputs. We will demonstrate its use to stage adversarial attacks in NLP for two attack scenarios: sentiment analysis and language inference.
Let’s start with sentiment analysis.
Attack scenario – sentiment analysis
In NLP, linear classifiers, such as logistic regression or linear support vector machines (SVMs), or language models such as BERT, are often used for tasks such as sentiment analysis or spam detection. These classifiers work by learning a decision boundary separating different feature space classes. Adversarial samples in NLP might involve changing words or phrases in a text snippet to change its classification from positive to negative sentiment or non-spam to spam, with the smallest change possible....