Chapter 9: Profiling data in Azure
Data profiling is an important part of every data project. It helps the data modeler create an accurate data model and tells ETL developers what type of data we have and how clean the data is. It will also dictate the various transformations we should apply to it.
Data profiling can help us find what metrics we can derive from the source dataset and to what extent we need to change (transform) the data to meet business rules. It can also help us find data inconsistencies before starting the ETL phase and derive a valid data model based on the source dataset.
The process flow from data ingestion to reporting can be described with the following diagram:
This chapter will focus on the Profiling data step shown in the preceding diagram. Here, you will learn common techniques to achieve data profiling.
In this chapter, we will cover the following recipes:
...