You're reading from Learn OpenAI Whisper Transform your understanding of GenAI through robust and accurate speech processing solutions

Product type Paperback

Published in May 2024

Publisher Packt

ISBN-13 9781835085929

Length 372 pages

Edition 1st Edition

Concepts

GPT/LLMs

Author (1):

Josué R. Batista

View More author details

Table of Contents (16) Chapters

Preface

1. Part 1: Introducing OpenAI’s Whisper FREE CHAPTER

2. Chapter 1: Unveiling Whisper – Introducing OpenAI’s Whisper

3. Chapter 2: Understanding the Core Mechanisms of Whisper

4. Part 2: Underlying Architecture

5. Chapter 3: Diving into the Whisper Architecture

6. Chapter 4: Fine-Tuning Whisper for Domain and Language Specificity

7. Part 3: Real-world Applications and Use Cases

8. Chapter 5: Applying Whisper in Various Contexts

9. Chapter 6: Expanding Applications with Whisper

10. Chapter 7: Exploring Advanced Voice Capabilities

11. Chapter 8: Diarizing Speech with WhisperX and NVIDIA’s NeMo

12. Chapter 9: Harnessing Whisper for Personalized Voice Synthesis

13. Chapter 10: Shaping the Future with Whisper

14. Index

Why subscribe?

15. Other Books You May Enjoy

Augmenting Whisper with speaker diarization

Speaker diarization, partitioning an audio stream into segments according to the speaker’s identity, is a powerful feature in multispeaker speech processing. It addresses the question of who spoke when? In a given audio clip, it is crucial to enhance the functionality and usability of ASR systems. The origins of speaker diarization can be traced back to the 1990s when the foundational work for clustering-based diarization paradigms was laid down. These early studies focused on radio broadcast news and communications applications, primarily aiming to improve ASR performance. The features used in these early studies were handcrafted mainly, with Mel-frequency cepstral coefficients (MFCCs) being a common choice.

Over time, the field of speaker diarization has seen significant advancements, particularly with the emergence of deep learning technology. Modern diarization systems often leverage neural networks and large-scale GPU computing...