Handling Missing Values
So far, you have looked at a variety of issues when it comes to datasets. Now it is time to discuss another issue that occurs quite frequently: missing values. As you may have guessed, this type of issue means that certain values are missing for certain variables.
The pandas
package provides a method that we can use to identify missing values in a DataFrame: .isna()
. Let's see it in action on the Online Retail
dataset. First, you need to import pandas
and load the data into a DataFrame:
import pandas as pd file_url = 'https://github.com/PacktWorkshops/The-Data-Science-Workshop/blob/master/Chapter10/dataset/Online%20Retail.xlsx?raw=true' df = pd.read_excel(file_url)
The .isna()
method returns a pandas
series with a binary value for each cell of a DataFrame and states whether it is missing a value (True
) or not (False
):
df.isna()
You should get the following output: