Mitigating Risk at Training by Validating and Maintaining Datasets
The training process for your model determines the output that your application provides when faced with data it hasn’t seen before. If the model is flawed in any way, then it’s not reasonable to expect unflawed output from the model. The testing process helps verify the model, but only when the data used for testing is accurate. Consequently, the datasets you use for training and testing your model are critical in a way that no other data you feed to your model is. Even with feedback (input that constantly changes the model based on the data it sees), initial training and testing sets the tone for the model and therefore remain critical. Assuming that your dataset is properly vetted, of the right size, and contains the right data, you still have to protect it from a wide variety of threats. This chapter assumes that you’ve started with a good dataset, but some internal or external entity wants...