One-Hot Encoding
One-hot encoding is a process of binarizing the categorical variable. This is done by transforming a categorical variable with n unique values into n unique columns in the datasets while keeping the number of rows the same. The following table shows how the wind direction column is transformed into five binary columns. For example, the row number 1 has the value North, so we get a 1 in the corresponding column named Direction_N and 0 in the remaining columns. So on for the other rows. Note that out of these sample five rows of data, the direction West is not present. However, the larger dataset would have got the value for us to have the column Direction_W.
One primary reason for converting categorical variables (such as the one shown in the previous table) to binary columns is related to the limitation of many machine learning algorithms, which can only deal with numerical values....