Measuring drift
There are two important things to consider for drift. We should first be able to measure drift, as we cannot counteract something that we are not aware of. Secondly, once we become aware of drift, we should define the right strategies for counteracting it. Let's discuss measurements for drift first.
Measuring data drift
As described earlier, data drift means that the measurements are slowly changing over time, whereas the underlying concepts stay the same. To measure this, descriptive statistics can be very useful. As you have seen a lot of descriptive statistics in earlier chapters, we will not repeat the theory behind this.
To apply descriptive statistics to measure data drift, we could simply set up a number of descriptive statistics and track them over time. For each variable, you could set up the following:
- Measurements of centrality (mean, median, mode)
- Measurements of variation (standard deviation, variance, interquartile range, or IQR...