Dirty data and what to do about it
A major use of QlikView is to create Data Quality Dashboards as the issue of Data Quality affects everyone and is something that we need to bear in mind when designing the data model.
Background
Different data sources have their own levels of quality issues. Databases have field types, which, at least, prevent errors such as text in a numeric field or an invalid date entry in a date field. However, spreadsheets usually do very little in the way of verification. Always try to work with clean data, even if it means extra work before starting the real development work. Ideally, you can ask the data provider to clean up the data for you!
How to do it
Whatever our data source, we have to assume that there could be issues in the data because what works today may fail to refresh correctly tomorrow.
Data Quality can basically be broken down into three types:
- Incorrect or erroneous data
- Inconsistent data
- Duplication
To explain further, incorrect or erroneous data could...