Data partitioning
In earlier chapters, we divided our datasets into subsets used for training, validation, and testing.
As a reminder, training data is used to develop the NLU model that is used to perform the eventual task of the NLU application, whether that is classification, slot-filling, intent recognition, or most other NLU tasks.
Validation data (sometimes called development test data) is used during training to assess the model on data that was not used in training. This is important because if the system is tested on the training data, it could get a good result simply by, in effect, memorizing the training data. This would be misleading because that kind of system isn’t very useful – we want the system to generalize or work well on the new data that it’s going to get when it is deployed. Validation data can also be used to help tune hyperparameters in machine learning applications, but this means that during development, the system has been exposed...