Data quality
When designing and developing machine learning systems, we consider the data quality on a relatively low level. We look for missing values, outliers, or similar. They are important because they can cause problems when training machine learning models. Nevertheless, they are nearly enough from a software engineering perspective.
When engineering reliable software systems, we need to know more about the data we use than whether it contains (or not) missing values. We need to know whether we can trust the data (whether it is believable), whether the data is representative, or whether it is up to date. So, we need a quality model for our data.
There are several quality models for data in software engineering, and the one I often use, and recommend, is the AIMQ model – a methodology for assessing information quality.
The quality dimensions of the AIMQ model are as follows (cited from Lee, Y.W., et al., AIMQ: a methodology for information quality assessment...