Training a machine model is an empirical and iterative approach, where we first prepare the data and the configuration, then train the model, fail, and restart again. Getting models to train on the first try is rare, but we'll persevere through hardship together.
When launching a training phase, we'll be looking at specific metrics to verify that our model is training properly and converging. We'll also be launching an evaluation phase, which executes on a separate, smaller dataset, to verify that the model can properly generalize on data that it hasn't seen yet.
The evaluation dataset is often called the validation dataset in machine learning in general, but we'll keep the term evaluation since it is used in Magenta.
The validation dataset is different than the test dataset, which is an external dataset, often curated by hand...
The validation dataset is different than the test dataset, which is an external dataset, often curated by hand...