Handling missing values
We regularly encounter empty fields in data records. It's best that we accept this and learn how to handle this kind of issue in a robust manner. Real data can not only have gaps, it can also have wrong values because of faulty measuring equipment, for example. In pandas, missing numerical values will be designated as NaN
, objects as None
, and the datetime64
objects as NaT
. The outcome of arithmetic operations with NaN
values is NaN
as well. Descriptive statistics methods, such as summation and average, behave differently. As we observed in an earlier example, in such a case, NaN
values are treated as zero values. However, if all the values are NaN
during summation, for example, the sum returned is still NaN
. In aggregation operations, NaN
values in the column that we group are ignored. We will again load the WHO_first9cols.csv
file into a DataFrame. Recall that this file contains empty fields. Let's only select the first three rows, including the headers of the Country...