Finding missing values
Before starting any analysis, we need to have a good sense of the number of missing values for each variable, and why those values are missing. We also want to know which rows in our data frame are missing values for several key variables. We can get this information with just a couple of statements in pandas.
We also need good strategies for dealing with missing values before we begin statistical modeling, since those models do not typically handle missing values flexibly. We introduce imputation strategies in this recipe and go into more detail in subsequent recipes in this chapter.
Getting ready
We will work with cumulative data on coronavirus cases and deaths by country. The DataFrame has other relevant information, including population density, age, and GDP.
Note
Our World in Data provides COVID-19 public use data at https://ourworldindata.org/coronavirus-source-data. The data used in this recipe was downloaded on June 1, 2020. The Covid case...