RandomForest Variable Importance
Chapter 4, Multiclass Classification with RandomForest, introduced you to a very powerful tree-based algorithm: RandomForest
. It is one of the most popular algorithms in the industry, not only because it achieves very good results in terms of prediction but also for the fact that it provides several tools for interpreting it, such as variable importance.
Remember from Chapter 4, Multiclass Classification with RandomForest, that RandomForest
builds multiple independent trees and then averages their results to make a final prediction. We also learned that it creates nodes in each tree to find the best split that will clearly separate the observations into two groups. RandomForest
uses different measures to find the best split. In sklearn
, you can either use the Gini or Entropy measure for the classification task and MSE or MAE for regression. Without going into the details of each of them, these measures calculate the level of impurity of a given split...