One-Hot Encoding
So, what is one-hot encoding? Well, in machine learning, we sometimes have categorical input features such as name, gender, and color. Such features contain label values rather than numeric values, such as John and Tom for name, male and female for gender, and red, blue, and green for color. Here, blue is one such label for the categorical feature – color. All machine learning models can work with numeric data, but many machine learning models cannot work with categorical data because of the way their underlying algorithms are designed. For example, decision trees can work with categorical data, but logistic regression cannot.
In order to still make use of categorical features with models such as logistic regression, we transform such features into a usable numeric format. Figure 6.1 shows an example of what this transformation looks like:
Figure 6.2 shows how one-hot encoding changes the dataset, once...