Let’s start with a filter method to reduce the number of variables in a first step. For that, we will measure the predictive power or the ability of a variable to classify our target variable individually and correctly.
In this case, we try to find variables that differentiate correctly between solvent and non-solvent banks. To measure the predictive power of a variable, we use a metric named Information Value (IV).
Specifically, given a grouped variable in n groups, each with a certain distribution of good banks and bad banks—or in our case, solvent and non-solvent banks—the information value for that predictor can be calculated as follows:
![](https://static.packt-cdn.com/products/9781838644338/graphics/assets/238254df-559f-4acc-b4cc-5e7c7bfb3b94.png)
The IV statistic is generally interpreted depending on its value:
- < 0.02: The variable of analysis does not accurately separate the classes in the target variable
- 0.02 to 0.1: The variable has a weak...