Using Aggregates to Clean Data and Examine Data Quality
In Chapter 1, Introduction to SQL for Analytics, we discussed how SQL can be used to clean data. While the techniques mentioned in that chapter do an excellent job of cleaning data, aggregates add a number of techniques that can make cleaning data even easier and more comprehensive. In this section, we will look at some of these techniques.
Finding Missing Values with GROUP BY
As we mentioned in Chapter 1, Introduction to SQL for Analytics, one of the biggest issues with cleaning data is dealing with missing values. Although we discussed how to find missing values and how we could get rid of them, we did not say too much about how we could determine the extent of missing data in a dataset. Primarily, this was because we did not have the tools to deal with summarizing information in a dataset – that is, until this chapter.
Using aggregates, identifying the amount of missing data can tell you not only which...