So far, we've used existing Magenta pre-trained models since they are quite powerful and easy to use. But training our own models is crucial since it allows us to generate music in a specific style or generate specific structures or instruments. Building and preparing a dataset is the first step before training our own model. To do that, we need to look at existing datasets and APIs that will help us to find meaningful data. Then, we need to build two datasets in MIDI for specific styles—dance and jazz. Finally, we will need to prepare the MIDI files for training using data transformations and pipelines.
The following topics will be covered in this chapter:
- Looking at existing datasets
- Building a dance music dataset
- Building a jazz dataset
- Preparing the data using pipelines