7.1.2 Approach
This project is based on the initial inspection notebook from Chapter 6, Project 2.1: Data Inspection Notebook. Some of the essential cell content will be reused in this notebook. We’ll add components to the components shown in the earlier chapter – specifically, the samples_iter()
function to iterate over samples in an open file. This feature will be central to working with the raw data.
In the previous chapter, we suggested avoiding conversion functions. When starting down the path of inspecting data, it’s best to assume nothing and look at the text values first.
There are some common patterns in the source data values:
The values appear to be all numeric values. The
int()
orfloat()
function works on all of the values. There are two sub-cases here:All of the values seem to be proper counts or measures in some expected range. This is ideal.
A few “outlier” values are present. These are values that seem to be outside the expected...