Analyzing the Content of a Categorical Variable
Now that we've got a good feel for the kind of information contained in the online retail dataset
, we want to dig a little deeper into each of its columns:
import pandas as pd file_url = 'https://github.com/PacktWorkshops/'\ Â Â Â Â Â Â Â Â Â Â Â 'The-Data-Science-Workshop/blob'\ Â Â Â Â Â Â Â Â Â Â Â '/master/Chapter10/dataset/'\ Â Â Â Â Â Â Â Â Â Â Â 'Online%20Retail.xlsx?raw=true' df = pd.read_excel(file_url)
For instance, we would like to know how many different values are contained in each of the variables by calling the nunique()
method. This is particularly useful for a categorical variable with a limited number of values, such as Country
:
df['Country'].nunique()
You should get the following output:
38
We can see that there are...