14.6 Converting categorical data
The GENDER
column contains categorical data rather
than numeric. The items in the column belong to a fixed set of values, which are usually
strings. In this case, the values are 'F'
and 'M'
. While we can check
if an item is equal to one of these, it is often easier to convert the categorical column to
multiple numeric “dummy columns” containing 0 and 1.
Here are the first two rows of df
:
df.head(2)
Locality Postcode Breed Colour Gender
0 DANDENONG NORTH 3175 DOMSH TAB F
1 DANDENONG NORTH 3175 DOMLH BLAWHI M
and this is what we get when we use get_dummies on the
GENDER
column:
pd.get_dummies(df, columns=["Gender"]).head(2)
Locality Postcode Breed Colour Gender_F Gender_M
0 DANDENONG NORTH 3175 DOMSH TAB 1 0
1 DANDENONG NORTH 3175 DOMLH ...