You're reading from Learn OpenAI Whisper Transform your understanding of GenAI through robust and accurate speech processing solutions

Product type Paperback

Published in May 2024

Publisher Packt

ISBN-13 9781835085929

Length 372 pages

Edition 1st Edition

Concepts

GPT/LLMs

Author (1):

Josué R. Batista

View More author details

Table of Contents (16) Chapters

Preface

1. Part 1: Introducing OpenAI’s Whisper FREE CHAPTER

2. Chapter 1: Unveiling Whisper – Introducing OpenAI’s Whisper

3. Chapter 2: Understanding the Core Mechanisms of Whisper

4. Part 2: Underlying Architecture

5. Chapter 3: Diving into the Whisper Architecture

6. Chapter 4: Fine-Tuning Whisper for Domain and Language Specificity

7. Part 3: Real-world Applications and Use Cases

8. Chapter 5: Applying Whisper in Various Contexts

9. Chapter 6: Expanding Applications with Whisper

10. Chapter 7: Exploring Advanced Voice Capabilities

11. Chapter 8: Diarizing Speech with WhisperX and NVIDIA’s NeMo

12. Chapter 9: Harnessing Whisper for Personalized Voice Synthesis

13. Chapter 10: Shaping the Future with Whisper

14. Index

Why subscribe?

15. Other Books You May Enjoy

Milestone 2 – Incorporating the Common Voice 11 dataset

The Common Voice dataset, spearheaded by Mozilla, represents a pioneering effort in democratizing speech technology through open and diverse speech corpora. A dataset is a structured collection of data where the rows typically represent individual observations or instances, and the columns represent the features or variables of those instances. In the case of Common Voice, each row represents an audio record, and each column represents features or characteristics applicable to the audio record. As an ever-expanding, community-driven initiative across 100+ languages, Common Voice optimally augments multilingual speech recognition systems like Whisper.

Integrating Common Voice data is straightforward with the Hugging Face Datasets library. We load the desired language split in streaming mode to bypass extensive storage requirements and expedite fine-tuning workflows:

from datasets import load_dataset, DatasetDict
common_voice...

The rest of the chapter is locked

You're reading from Learn OpenAI Whisper Transform your understanding of GenAI through robust and accurate speech processing solutions

Table of Contents (16) Chapters

Milestone 2 – Incorporating the Common Voice 11 dataset

Authors (1)

Personalised recommendations for you

You're reading from Learn OpenAI Whisper Transform your understanding of GenAI through robust and accurate speech processing solutions

Table of Contents (16) Chapters

Milestone 2 – Incorporating the Common Voice 11 dataset

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you