You're reading from Learn OpenAI Whisper Transform your understanding of GenAI through robust and accurate speech processing solutions

Product type Paperback

Published in May 2024

Publisher Packt

ISBN-13 9781835085929

Length 372 pages

Edition 1st Edition

Concepts

GPT/LLMs

Author (1):

Josué R. Batista

View More author details

Table of Contents (16) Chapters

Preface

1. Part 1: Introducing OpenAI’s Whisper FREE CHAPTER

2. Chapter 1: Unveiling Whisper – Introducing OpenAI’s Whisper

3. Chapter 2: Understanding the Core Mechanisms of Whisper

4. Part 2: Underlying Architecture

5. Chapter 3: Diving into the Whisper Architecture

6. Chapter 4: Fine-Tuning Whisper for Domain and Language Specificity

7. Part 3: Real-world Applications and Use Cases

8. Chapter 5: Applying Whisper in Various Contexts

9. Chapter 6: Expanding Applications with Whisper

10. Chapter 7: Exploring Advanced Voice Capabilities

11. Chapter 8: Diarizing Speech with WhisperX and NVIDIA’s NeMo

12. Chapter 9: Harnessing Whisper for Personalized Voice Synthesis

13. Chapter 10: Shaping the Future with Whisper

14. Index

Why subscribe?

15. Other Books You May Enjoy

PVS step 1 – Converting audio files into LJSpeech format

This section and the accompanying notebook, LOAIW_ch09_2_Processing_audio_to_LJ_format_with_Whisper_OZEN.ipynb, represent the initial step in the three-step PVS process outlined in this chapter. This step takes an audio sample of the target voice as input and processes it into the LJSpeech dataset format. The notebook demonstrates using the OZEN Toolkit and OpenAI’s Whisper to extract speech, transcribe it, and organize the data according to the LJSpeech structure. The resulting LJSpeech-formatted dataset, consisting of segmented audio files and corresponding transcriptions, serves as the input for the second step, PVS step 2 – Fine-tuning a discrete variational autoencoder using the DLAS toolkit, where a PVS model will be fine-tuned using this dataset.

An LJSpeech-formatted dataset is crucial in TTS models as it provides a standardized structure for organizing audio files and their corresponding transcriptions...

The rest of the chapter is locked

You're reading from Learn OpenAI Whisper Transform your understanding of GenAI through robust and accurate speech processing solutions

Table of Contents (16) Chapters

PVS step 1 – Converting audio files into LJSpeech format

Authors (1)

Personalised recommendations for you

You're reading from Learn OpenAI Whisper Transform your understanding of GenAI through robust and accurate speech processing solutions

Table of Contents (16) Chapters

PVS step 1 – Converting audio files into LJSpeech format

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you