Data Profiling – Understanding Data Structure, Quality, and Distribution
Data profiling refers to scrutinizing, understanding, and validating datasets to learn more about their underlying structure, patterns, and quality. It is a critical step in data management and ingestion as it can enhance data quality and accuracy and ensure compliance with regulatory standards. In this chapter, you will learn how to perform profiling with different tools and how to change your tactics as the data volume increases.
In this chapter, we will deep dive into the following topics:
- Understanding data profiling
- Data profiling with the pandas profiler
- Data validation with Great Expectations
- Comparing Great Expectations and the pandas profiler – when to use what
- How to profile big data volumes