Variable importance
Statistical models, say linear regression and logistic regression, indicate which variables are significant with measures such as p-value and t-statistics. In a decision tree, a split is caused by a single variable. If the specification of the number of variables for the surrogate splits, a certain variable may appear as the split criteria more than once in the tree and some variables may never appear in the tree splits at all. During each split, we select the variable that leads to the maximum reduction in impurity, and the contribution of a variable across the tree splits would also be different. The overall improvement across each split of the tree (by the reduction in impurity for the classification tree or by the improvement in the split criterion) is referred to as the variable importance. In the case of ensemble methods such as bagging and random forest, the variable importance is measured for each tree in the technique. While the concept of variable importance...