Technical requirements
For this chapter, we will leverage Google Colaboratory. We’ll try to secure the best GPU we can afford, with a minimum of 12 GB of GPU memory.
To get a GPU, within Google Colab’s main menu, click Runtime | Change runtime type, then change the Hardware accelerator from None to GPU.
Keep in mind that fine-tuning Whisper will take several hours. Thus, you must monitor your running notebook in Colab regularly.
This chapter teaches you how to fine-tune the Whisper model so that it can recognize speech in multiple languages using tools such as Hugging Face Datasets, Transformers, and the Hugging Face Hub. Check out the Google Colab Python notebook in this book’s GitHub repository (https://github.com/PacktPublishing/Learn-OpenAI-Whisper/tree/main/Chapter04) and try fine-tuning yourself.
The general recommendation is to follow the Colab notebook and upload model checkpoints directly to the Hugging Face Hub while training. The Hub provides...