Handling missing values
We regularly encounter empty fields in data records. It's best that we accept this and learn how to handle this kind of issue in a robust manner. Real data can not only have gaps-it can also have wrong values, because of faulty measuring equipment, for example. In Pandas, missing numerical values will be designated as NaN
, objects as None
, and the datetime64
objects as NaT
. The outcome of arithmetic operations with NaN
values is also NaN
. Descriptive statistics methods, such as summation and average, behave differently. As we observed in an earlier example, in such a case, NaN
values are treated as zero values. However, if all the values are NaN
during, say, summation, the sum returned is still NaN
. In aggregation operations, NaN
values in the column that we group are ignored. We will again load the WHO_first9cols.csv
file into a DataFrame. Remember that this file contains empty fields. Let's only select the first three rows, including the headers of the Country
and...