Replacing categories with counts or the frequency of observations
In count with counts or frequency of observations” or frequency encoding, we replace the categories with the count or the fraction of observations showing that category. That is, if 10 out of 100 observations show the blue
category for the Color
variable, we would replace blue
with 10
when doing count encoding, or with 0.1
if performing frequency encoding. These encoding methods are useful when there is a relationship between the category frequency and the target. For example, in sales, the frequency of a product may indicate its popularity.
Note
If two different categories are present in the same number of observations, they will be replaced by the same value, which may lead to information loss.
In this recipe, we will perform count and frequency encoding using pandas
and feature-engine
.
How to do it...
We’ll start by encoding one variable with pandas
and then we’ll automate the process...