Implementing target mean encoding
Mean encoding or target encoding maps each category to the probability estimate of the target attribute. If the target is binary, the numerical mapping is the posterior probability of the target conditioned to the value of the category. If the target is continuous, the numerical representation is given by the expected value of the target given the value of the category.
In its simplest form, the numerical representation for each category is given by the mean value of the target variable for a particular category group. For example, if we have a City
variable, with the categories of London
, Manchester
, and Bristol
, and we want to predict the default rate (the target takes values of 0
and 1
); if the default rate for London
is 30%, we replace London
with 0.3
; if the default rate for Manchester
is 20%, we replace Manchester
with 0.2
; and so on. If the target is continuous – say we want to predict income – then we would replace London
,...