Creating binary variables through one-hot encoding
One-hot encoding is a method used to represent categorical data, where each category is represented by a binary variable. The binary variable takes a value of 1
if the category is present, or 0
otherwise.
The following table shows the one-hot encoded representation of the Smoker
variable with the categories of Smoker
and Non-Smoker
:
Figure 2.1 – One-hot encoded representation of the Smoker variable
As shown in Figure 2.1, from the Smoker
variable, we can derive a binary variable for Smoker
, which shows the value of 1
for smokers, or the binary variable for Non-Smoker
, which takes the value of 1
for those who do not smoke.
For the Color
categorical variable with the values of red
, blue
, and green
, we can create three variables called red
, blue
, and green
. These variables will be assigned a value of 1
if the observation corresponds to the respective color, and 0
if it does not.
A categorical...