Data quality
In Optimus, we call the process of counting the number of values in a column that match a specific profiler data type data quality. For example, if the profiler data type in a column is URL
, Optimus will count the number of values in a column that do the following:
- Match the URL format, such as
"google.com"
. - Do NOT match the URL format, such as
"google"
. - It will also count the null values.
Optimus has many data types in the profiler, which are inferred with a combination of regular expressions and number type detection. For reference, in the following table, we list the profiler data types and the Python data types:
These data types are inferred when you run the profiler. Also, you can change the profiler if you are sure that a profiler datatype should have a specific data type:
from optimus import Optimus op = Optimus("pandas") df = op.load...