Real-world audio datasets
By now, you should be familiar with downloading Pluto and real-world datasets from the Kaggle website. We chose to download Pluto from Chapter 2 because the image augmentation functions shown in Chapters 3 and 4, and the text augmentation techniques shown in Chapters 5 and 6, are not beneficial for audio augmentation.
The three real-world audio datasets we will use are as follows:
- The Musical Emotions Classification (MEC) real-world audio dataset from Kaggle contains 2,126 songs separated into train and test folders. They are instrumental music, and the goal is to predict happy or sad music. Each piece is about 9 to 10 minutes in length and is in *.wav format. It was published in 2020 and is available to the public. Its license is Attribution-ShareAlike 4.0 International (CC BY-SA 4.0): https://creativecommons.org/licenses/by-sa/4.0/.
- The Crowd Sourced Emotional Multimodal Actors Dataset (CREMA-D) real-world audio dataset from Kaggle contains...