Performing binary encoding
Binary encoding uses binary code – that is, a sequence of zeroes and ones – to represent the different categories of the variable. How does it work? First, the categories are arbitrarily replaced with ordinal numbers, as shown in the intermediate step of the following table. Then, those numbers are converted into binary code. For example, integer 1
can be represented with the sequence of 1-0
, integer 2
with 0-1
, integer 3
with 1-1
, and integer 0
with 0-0
. The digits in the two positions of the binary string become the columns, which are the encoded representations of the original variable:
Figure 2.10 – Table showing the steps required for binary encoding the color variable
Binary encoding encodes the data in fewer dimensions than one-hot encoding. In our example, the Color
variable would be encoded into k-1 categories by one-hot encoding – that is, three variables – but with binary encoding...