Now that we have gone through all of the variable groups, we are almost ready to build our predictive models. But first, we must expand all of our categorical variables into binary variables (also known as one-hot encoding or a 1-of-K representation) and convert our data into a format suitable for input into the scikit-learn methods. Let's do that next.
Final preprocessing steps
One-hot encoding
Many classifiers of the scikit-learn library require categorical variables to be one-hot encoded. One-hot encoding, or a 1-of-K representation, is when a categorical variable that has more than two possible values is recorded as multiple variables each having two possible values.
For example, let's say that we have five patients...