Identifying data drift
In a typical tabular dataset, it is relatively easy to understand if the incoming data point is an outlier by looking at the summary statistics of the dataset on which the model is trained. However, computer vision models are not as straightforward – we have already seen the quirks that they have in Chapter 4, where, just by translating pixels by a few pixels, the predicted class changed. However, this is not the only scenario of data drift in the case of images. There are any number of ways in which the data coming into the production model is different from the data the model was trained on. These may be things that are obvious, such as the image lighting being off and the expected subject in the image being wrong, or subtle things that a human eye cannot see.
In this section, we will understand ways of measuring drift between the input images in real time (real-world images for prediction) and the images that were used during the training of the...