Chapter 4. Data Quality and Exploration
The previous chapter introduced the general data structure that is used in Modeler. You learned how to read and display data, and you were introduced to the concepts of the measurement level and the field roles. Now that you know how to bring data into Modeler, the next step is to assess the quality of the data. In this chapter you will:
- Get an overview of the Data Audit node options
- Go over the results of the Data Audit node
- Be introduced to missing data
- Discuss ways to address missing data
Once your data is in Modeler, you are ready to start exploring and become familiar with the characteristics of the data. You should review the distribution of each field so that you can become familiar with a dataset, but also so that you can identify potential problems that may arise. For continuous fields, you will want to inspect the range of values. For categorical fields, you will want to take a look at the number of distinct values. You will also have to consider...