Let’s start with a filter method to reduce the number of variables in a first step. For that, we will measure the predictive power or the ability of a variable to classify our target variable individually and correctly.
In this case, we try to find variables that differentiate correctly between solvent and non-solvent banks. To measure the predictive power of a variable, we use a metric named Information Value (IV).
Specifically, given a grouped variable in n groups, each with a certain distribution of good banks and bad banks—or in our case, solvent and non-solvent banks—the information value for that predictor can be calculated as follows:
The IV statistic is generally interpreted depending on its value:
- < 0.02: The variable of analysis does not accurately separate the classes in the target variable
- 0.02 to 0.1: The variable has a weak...