Encoding with Weight of Evidence
Weight of Evidence (WoE) was developed primarily for credit and financial industries to facilitate variable screening and exploratory analysis and to build more predictive linear models to evaluate the risk of loan defaults.
The WoE is computed from the basic odds ratio:
Here, positive and negative refer to the values of the target being 1 or 0, respectively. The proportion of positive cases per category is determined as the sum of positive cases per category group divided by the total positive cases in the training set. The proportion of negative cases per category is determined as the sum of negative cases per category group divided by the total number of negative observations in the training set.
WoE has the following characteristics:
- WoE = 0 if p(positive) / p(negative) = 1; that is, if the outcome is random
- WoE > 0 if p(positive) > p(negative)
- WoE < 0 if p(negative) > p(positive)
This allows us to...