Analyzing the Content of a Categorical Variable
Now that we've got a good feel for the kind of information contained in the online retail dataset
, we want to dig a little deeper into each of its columns:
import pandas as pd file_url = 'https://github.com/PacktWorkshops/The-Data-Science-Workshop/blob/master/Chapter10/dataset/Online%20Retail.xlsx?raw=true' df = pd.read_excel(file_url)
For instance, we would like to know how many different values are contained in each of the variables by calling the nunique()
method. This is particularly useful for a categorical variable with a limited number of values, such as Country
:
df['Country'].nunique()
You should get the following output:
38
We can see that there are 38 different countries in this dataset. It would be great if we could get a list of all the values in this column. Thankfully, the pandas
package provides a method to get these results: unique()
:
df['Country'].unique()
You...