Part 2: Underlying Architecture
In this part, you will explore the technical backbone of OpenAI’s Whisper, exploring its architecture and the transformer model that drives its cutting-edge ASR capabilities. You will understand Whisper’s inner workings comprehensively, including its encoder-decoder mechanics, multitasking and multilingual capabilities, and training techniques using weak supervision on large-scale data. Additionally, you will learn how to fine-tune Whisper for specific domain and language needs, enabling you to customize and integrate it effectively into various applications.
This part includes the following chapters:
- Chapter 3, Diving into the Whisper Architecture
- Chapter 4, Fine-Tuning Whisper for Domain and Language Specificity