In the previous sections, we looked at existing datasets and developed tools so that we can find and extract specific content. By doing so, we've effectively built a dataset we want to train our model on. But building the dataset isn't all – we also need to prepare it. By preparing, we mean the action of removing everything that isn't useful for training, cutting, and splitting tracks, and also automatically adding more content.
In this section, we'll be looking at some built-in utilities that we can use to transform the different data formats (MIDI, MusicXML, and ABCNotation) into a training-ready format. These utilities are called pipelines in Magenta, and are a sequence of operations that are executed on the input data.
An example of an operation that is already implemented in pipelines includes discarding melodies...