Dealing with large financial datasets using data validation
When dealing with large financial datasets, the tendency is to allow a certain percentage of incorrectness or inaccuracy due to the effort needed to clean the data. However, outliers in the data will affect analysis and report generation, especially if these outliers and errors aren’t cleaned due to time-saving methods in the overall process. That said, guidelines should be created on what the thresholds are in advance for each column and set of records. These guidelines then need to be converted into automated processes available in the BI tool.
An example would be a guideline where column values cannot be negative, cannot exceed a certain threshold, or should be a particular set of values. This guideline would then be converted into a rule that can then be used to automatically detect data issues. Once the incorrect records have been tagged accordingly, these records can be analyzed and corrected manually. In some...