Milestone 6 – Establishing standardized test sets and metrics for performance benchmarking
Now, let’s learn how to check our model’s performance. We’ll use the WER metric, a common way to evaluate speech recognition systems. We’ll load the WER metric from Hugging Face evaluate
:
import evaluate metric = evaluate.load("wer")
Next, we’ll create a function called compute_metrics
to calculate the WER:
def compute_metrics(pred): # [Code to replace -100, decode predictions and labels, and compute WER] return {"wer": wer}
This function fixes our label_ids
(where we had replaced padding tokens with -100
). Then, it turns both the predicted and label IDs into text strings. Lastly, it calculates the WER between these two.
Loading a pre-trained model checkpoint
We’ll start with a pre-trained Whisper model. This is easy with Hugging Face Transformers:
from transformers...