You're reading from Learn OpenAI Whisper Transform your understanding of GenAI through robust and accurate speech processing solutions

Product type Paperback

Published in May 2024

Publisher Packt

ISBN-13 9781835085929

Length 372 pages

Edition 1st Edition

Concepts

GPT/LLMs

Author (1):

Josué R. Batista

View More author details

Table of Contents (16) Chapters

Preface

1. Part 1: Introducing OpenAI’s Whisper FREE CHAPTER

2. Chapter 1: Unveiling Whisper – Introducing OpenAI’s Whisper

3. Chapter 2: Understanding the Core Mechanisms of Whisper

4. Part 2: Underlying Architecture

5. Chapter 3: Diving into the Whisper Architecture

6. Chapter 4: Fine-Tuning Whisper for Domain and Language Specificity

7. Part 3: Real-world Applications and Use Cases

8. Chapter 5: Applying Whisper in Various Contexts

9. Chapter 6: Expanding Applications with Whisper

10. Chapter 7: Exploring Advanced Voice Capabilities

11. Chapter 8: Diarizing Speech with WhisperX and NVIDIA’s NeMo

12. Chapter 9: Harnessing Whisper for Personalized Voice Synthesis

13. Chapter 10: Shaping the Future with Whisper

14. Index

Why subscribe?

15. Other Books You May Enjoy

To get the most out of this book

For most of the book, you only need a Google account and internet access to run the Whisper AI code in Google Colaboratory (Colab). No paid subscription is required to use the free version of Colab and GPU. Those familiar with Python can run this code example in their local environment instead of using Colab.

Software/hardware covered in the book	Operating system requirements
Google Colaboratory (Colab)	Web browser on Windows, macOS, or Linux
Google Drive
YouTube
RSS
GitHub
Python
Hugging Face
Gradio
Foundational models: Google’s gTTS StableLM Zephyr 3B – GGUF LlaVA
Intel’s OpenVINO
NVIDIA’s NeMo
Microphone and speakers

Whisper’s small model requires at least 12 gigabytes of GPU memory. Thus, let’s try to secure a decent GPU for our Colab! Unfortunately, accessing a good GPU with the free version of Google Colab (i.e., Tesla T4 16 GB) is becoming much harder. However, with Google Colab Pro, we should have no issues in being allocated a V100 or P100 GPU.

If you are using the digital version of this book, we advise you to type the code yourself or access it from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to copying and pasting code.

Fine-tuning Whisper in Chapter 4 will take at least one hour. Thus, you must monitor your running notebook in Colab regularly. Some notebooks implement a Gradio app with voice recording and audio playback. A microphone and speakers connected to your computer might help you experience the interactive voice features. Another option is to open the URL link Gradio provides at runtime on your mobile phone; from there, you might be able to use the phone’s microphone to record your voice.

By meeting these technical requirements, you will be prepared to explore Whisper in different contexts while enjoying the streamlined experience of Google Colab and the comprehensive resources available on GitHub.