Profiling to get a better understanding of your data
With enterprises going through digital transformation, there has been a proliferation of data and multiple systems that often result in redundant datasets with varying levels of data quality. With large volumes of data, it is imperative that we get a good understanding of the characteristics of the datasets, what attributes/columns of data are available, the data types of the individual columns, their unique values, and how the data values are distributed. This profile information helps us isolate datasets of interest from the large collection of possibly overlapping datasets.
With Cloud Pak for Data, we have a centralized governance catalog that acts as the index of all our data assets and helps us organize resources for many data science projects: data assets, analytical assets, and the users who need to use these assets. This catalog also has built-in profiling capabilities that allow the consumers to select a given dataset...