One-hot encoding with pd.get_dummies
It is not uncommon in data analysis and machine learning applications to take data that is categorical in nature and convert it into a sequence of 0/1
values, as the latter can be more easily interpreted by numeric algorithms. This process is often called one-hot encoding, and the outputs are typically referred to as dummy indicators.
How to do it
Let’s start with a small pd.Series
containing a discrete set of colors:
ser = pd.Series([
"green",
"brown",
"blue",
"amber",
"hazel",
"amber",
"green",
"blue",
"green",
], name="eye_colors", dtype=pd.StringDtype())
ser
0 green
1 brown
2 blue
3 amber
4 hazel
5 amber
6 green
7 blue
8 green
Name: eye_colors, dtype: string
Passing this as an argument to pd.get_dummies
will create a like-indexed pd.DataFrame
with a...