Exploring adversarial analysis for text-based models
Text-based models can sometimes have performance vulnerabilities toward the usage of certain words, a specific inflection of a word stem, or a different form of the same word. Here’s an example:
Supervised Use Case: Sentiment Analysis Prediction Row: {"Text": "I love this product!", "Sentiment": "Positive"} Adversarial Example: {"Text": "I l0ve this product!", "Sentiment": "Negative"}
So, adversarial analysis can be done by benchmarking performance on when you add important words to a sentence versus without. To mitigate such attacks, similar word replacement augmentation can be applied during training.
However, when it comes to text-based models in the modern day, most widely adopted models now rely on a pre-trained language modeling foundation. This allows them to be capable of understanding natural language even after domain fine-tuning...