Dummy variables are categorical independent variables used in regression analysis. It is also known as a Boolean, indicator, qualitative, categorical, and binary variable. Dummy variables convert a categorical variable with N distinct values into N–1 dummy variables. It only takes the 1 and 0 binary values, which are equivalent to existence and nonexistence.
pandas offers the get_dummies() function to generate the dummy values. Let's understand the get_dummies() function through an example:
# Import pandas module import pandas as pd
# Create pandas DataFrame data=pd.DataFrame({'Gender':['F','M','M','F','M']})
# Check the top-5 records data.head()
This results in the following output:
|
Gender |
0 |
F |
1 |
M |
2 |
M |
3 |
F |
4 |
M |
In the preceding code block, we have created the DataFrame with the Gender column and generated the dummy variable using the get_dummies() function...