An introduction to data quality rules
A data quality rule is logic that is applied to each row of a dataset, which can determine whether the row of data is correct or incorrect. Correct data is deemed to have passed the rule, and incorrect data is deemed to have failed the rule – hence, the term failed data, which is used heavily in Chapter 7.
Data quality rules always give a Boolean output – in other words, a row of data always passes or fails.
The following table provides a few (purposefully very simple) examples:
Business logic |
Passed row example |
Failed row example |
The VAT number must be complete for all suppliers. |
Any row with any character in this field would pass. |
Any row which is “null” or “blank” would fail. |
The VAT number... |