Creating dummy variables
Creating dummy variables is a method to create separate variable for each category of a categorical variable., Although, the categorical variable contains plenty of information and might show a causal relationship with output variable, it can't be used in the predictive models like linear and logistic regression without any processing.
In our dataset, sex
is a categorical variable with two categories that are male and female. We can create two dummy variables out of this, as follows:
dummy_sex=pd.get_dummies(data['sex'],prefix='sex')
The result of this statement is, as follows:
This process is called dummifying, the variable creates two new variables that take either 1
or 0
value depending on what the sex of the passenger was. If the sex was female, sex_female
would be 1
and sex_male
would be 0
. If the sex was male, sex_male
would be 1
and sex_female
would be 0
. In general, all but one dummy variable...