Generating frequencies for categorical variables
Many years ago, a very seasoned researcher said to me, "90% of what we're going to find, we'll see in the frequency distributions." That message has stayed with me. The more one-way and two-way frequency distributions (crosstabs) I do on a DataFrame, the better I understand it. We will do one-way distributions in this recipe, and crosstabs in subsequent recipes.
Getting ready…
We continue our work with the NLS. We will also be doing a fair bit of column selection using filter
methods. It is not necessary to review the recipe in this chapter on column selection, but it might be helpful.
How to do it…
We use pandas tools to generate frequencies, particularly the very handy value_counts
:
- Load the
pandas
library and thenls97
file.Also, convert the columns with object data type to category data type:
>>> import pandas as pd >>> nls97 = pd.read_csv("data/nls97.csv...