Handling Missing Values
So far, you have looked at a variety of issues when it comes to datasets. Now it is time to discuss another issue that occurs quite frequently: missing values. As you may have guessed, this type of issue means that certain values are missing for certain variables.
The pandas
package provides a method that we can use to identify missing values in a DataFrame: .isna()
. Let's see it in action on the Online Retail
dataset. First, you need to import pandas
and load the data into a DataFrame:
import pandas as pd file_url = 'https://github.com/PacktWorkshops/'\ Â Â Â Â Â Â Â Â Â Â Â 'The-Data-Science-Workshop/blob/'\ Â Â Â Â Â Â Â Â Â Â Â 'master/Chapter10/dataset/'\ Â Â Â Â Â Â Â Â Â Â Â 'Online%20Retail.xlsx?raw=true' df = pd.read_excel(file_url)
The .isna()
method returns a pandas...