Encoding categorical features: ordinal encoding
Categorical features can be either nominal or ordinal. Gender and marital status are nominal. Their values do not imply order. For example, never married is not a higher value than divorced.
When a categorical feature is ordinal, however, we want the encoding to capture the ranking of the values. For example, if we have a feature that has the values low, medium, and high, one-hot encoding would lose this ordering. Instead, a transformed feature with values of 1, 2, and 3 for low, medium, and high, respectively, would be better. We can accomplish this with ordinal encoding.
The college enrollment feature on the NLS dataset can be considered an ordinal feature. The values range from 1. Not enrolled to 3. 4-year college. We should use ordinal encoding to prepare it for modeling. We do that next.
Getting ready
We will use the OrdinalEncoder
module in this recipe from scikit-learn
.
How to do it...
- College enrollment...