Practicing PFI
The concept of PFI is much easier to explain than any model-specific feature importance method! It merely measures the increase in prediction error once the values of each feature have been shuffled. The theory for PFI is based on the logic that if the feature has a relationship with the target variable, shuffling will disrupt it and increase the error. On the other hand, if the feature doesn't have a strong relationship with the target variable, the prediction error won't increase by much, if at all. Then, if you rank features by those whose shuffling increases the error the most, you'll appreciate which ones are most important to the model.
In addition to being a model-agnostic method, PFI can be used with unseen data such as the test dataset, which is a massive advantage. In this case, because it is overfitting with Random Forest and Gradient Boosting Trees, how reliable can feature importance derived from intrinsic parameters be? It tells you what...