Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Learn OpenAI Whisper

You're reading from   Learn OpenAI Whisper Transform your understanding of GenAI through robust and accurate speech processing solutions

Arrow left icon
Product type Paperback
Published in May 2024
Publisher Packt
ISBN-13 9781835085929
Length 372 pages
Edition 1st Edition
Concepts
Arrow right icon
Author (1):
Arrow left icon
Josué R. Batista Josué R. Batista
Author Profile Icon Josué R. Batista
Josué R. Batista
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Part 1: Introducing OpenAI’s Whisper FREE CHAPTER
2. Chapter 1: Unveiling Whisper – Introducing OpenAI’s Whisper 3. Chapter 2: Understanding the Core Mechanisms of Whisper 4. Part 2: Underlying Architecture
5. Chapter 3: Diving into the Whisper Architecture 6. Chapter 4: Fine-Tuning Whisper for Domain and Language Specificity 7. Part 3: Real-world Applications and Use Cases
8. Chapter 5: Applying Whisper in Various Contexts 9. Chapter 6: Expanding Applications with Whisper 10. Chapter 7: Exploring Advanced Voice Capabilities 11. Chapter 8: Diarizing Speech with WhisperX and NVIDIA’s NeMo 12. Chapter 9: Harnessing Whisper for Personalized Voice Synthesis 13. Chapter 10: Shaping the Future with Whisper 14. Index 15. Other Books You May Enjoy

Conventions used

Several text conventions are used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. An example is “Users can even provide audiovisual formats such as .mp4 as inputs, as Whisper will extract just the audio stream to process.”

A block of code is set as follows:

from datasets import load_dataset, DatasetDict
common_voice = DatasetDict()
common_voice["train"] = load_dataset("mozilla-foundation/common_voice_11_0", "hi", split="train+validation", use_auth_token=True)
common_voice["test"] = load_dataset("mozilla-foundation/common_voice_11_0", "hi", split="test", use_auth_token=True)
print(common_voice)

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

!pip install --upgrade pip
!pip install --upgrade datasets transformers accelerate soundfile librosa evaluate jiwer tensorboard gradio

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “To get a GPU, within Google Colab’s main menu, click Runtime | Change runtime type, then change the Hardware accelerator from None to GPU.”

Tips or important notes

Appear like this.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime