Identifying missing values
Since identifying missing values is such an important part of an analyst's workflow, any tool we use needs to make it easy to regularly check for such values. Fortunately, pandas makes it quite simple to identify missing values.
We will be working with the National Longitudinal Survey (NLS) in this chapter. The NLS has one observation per survey respondent. Data for employment, earnings, and college enrollment for each year are stored in columns with suffixes representing the year, such as weeksworked16
and weeksworked17
for weeks worked in 2016 and 2017, respectively.
Note
We will also work with the COVID-19 data again. This dataset has one observation for each country that specifies the total COVID-19 cases and deaths, as well as some demographic data for each country.
Follow these steps to identify our missing values:
- Let's start by loading the NLS and COVID-19 data:
import pandas as pd import numpy as np nls97 = pd.read_csv...