Binary encoding is an alternative categorical encoding technique that uses binary code, that is, a sequence of zeroes and ones, to represent the different categories of the variable. How does it work? First, the categories are arbitrarily replaced by ordinal numbers, as shown in the intermediate step of the following table. Then, those numbers are converted into binary code. For example, the integer 1 can be represented as the sequence 01, the integer 2 as 10, the integer 3 as 00, and 4 as 11. The digits in the two positions of the binary string become the columns, which are the encoded representation of the original variable:
Color | Intermediate step | 1st | 2nd |
Blue | 1 | 0 | 1 |
Red | 2 | 1 | 0 |
Green | 3 | 0 | 0 |
Yellow | 4 | 1 | 1 |
Â
Binary encoding encodes the data in fewer dimensions than one-hot encoding. In our example, the color variable would be encoded...