Identifying missing values
A missing value refers to the absence of a specific value within a variable. In structured data, it represents an empty cell in a dataframe and is sometimes represented as NA, NaN, NULL, and so on.
Missing values can lead to inaccurate conclusions and biased analysis; therefore, it is important to handle them when encountered in our dataset.
The following example illustrates this:
Figure 9.16: Class assessment scores with missing values (left) and without missing values (right)
From the preceding example, we can deduce the following:
- With missing values: Class B and Class C have on distinction (>=90) each, while Class A has none. The average scores across the classes are 71.25, 63.75, and 75, respectively. Class C has the highest average score.
- Without missing values: All classes have one distinction (>=90) each. The average scores across the classes are 76, 63.75, and 63, respectively. Class A has the...