Performing ordinal encoding based on the target value
In the previous recipe, we replaced categories with integers, which were assigned arbitrarily. We can also assign integers to the categories given the target values. To do this, first, we calculate the mean value of the target per category. Next, we order the categories from the one with the lowest to the one with the highest target mean value. Finally, we assign digits to the ordered categories, starting with 0 to the first category up to k-1 to the last category, where k is the number of distinct categories.
This encoding method creates a monotonic relationship between the categorical variable and the response and therefore makes the variables more adequate for use in linear models.
In this recipe, we will encode categories while following the target value using pandas
and feature-engine
.
How to do it...
First, let’s import the necessary Python libraries and get the dataset ready:
- Import the required...