We've already seen the random forests algorithm in use in this chapter, in the Predicting classes with random forests recipe, where we used it for class prediction and regression. Here, we're going to use it for a different purpose—to try and work out which of the variables in a dataset contribute most to the classification or regression accuracy of the trained model. This requires only a simple change to the code we already have and a new function or two.
Identifying the most important variables in data with random forests
Getting ready
We'll need the randomForest package and the built-in iris dataset.