Analyzing adversarial performance for audio-based models
Adversarial analysis for audio-based models requires audio augmentations. In this section, we will be leveraging the open source audiomentations
library to apply audio augmentation methods. We will analyze the adversarial accuracy-based performance of a speech recognition model practically. The accuracy metric we’ll use is the Word Error Rate (WER), which is a commonly used metric in automatic speech recognition and machine translation systems. It measures the dissimilarity between a system’s output and the reference transcription or translation by calculating the sum of word substitutions, insertions, and deletions divided by the total number of reference words, resulting in a percentage value. The formula for WER is as follows:
WER = (S + I + D) / N
Here, we have the following:
- S represents the number of word substitutions
- I represents the number of word insertions
- D represents the number...