Cleaning up data in the database
Cleaning up data is an important topic; often, the data life cycle is not defined when creating a database application. This leads to tons of outdated data. Unclean data hinders several processes, such as database refactoring. Also, it can have a side effect on all processes in the company, such as wrong report results, billing issues, unauthorized access, and so on.
Getting ready
Several recipes were introduced to determine unused objects, but this is not all. The data itself should be cleaned, and the data life cycle should be defined.
For unclean data, there are several scenarios; however, let's focus here only on duplicated rows due to the missing unique and primary key constraints.
How to do it…
The first step is to identify the tables that do not have unique and primary key constraints. This is quite easy using the information schema, as follows:
SELECT table_catalog, table_schema, table_name FROM information_schema.tables WHERE table_schema NOT IN ...