Summary
In this chapter, we have delved into the fundamentals of audio data, including the concept of waveforms, sample rates, and the discrete nature of audio. These fundamentals provide the building blocks for audio analysis. We analyzed the difference between spectrograms and mel spectrograms in audio analysis and visualized how audio signals change over time and how they relate to human perception. Visualization is a powerful way to gain insights into the structure and characteristics of audio. With the knowledge and techniques gained in this chapter, we are better equipped to explore the realms of speech recognition, music classification, and countless other applications where sound takes center stage.
In the next chapter, we will learn how to label audio data using CNNs and speech recognition using the Whisper model and Azure Cognitive Services.