Proximity plots
According to Hastie, et al. (2009), "one of the advertised outputs of a random forest is a proximity plot" (see page 595). But what are proximity plots? If we have n observations in the training dataset, a proximity matrix of order is created. Here, the matrix is initialized with all the values at 0. Whenever a pair of observations such as OOB occur jointly in the terminal node of a tree, the proximity count is increased by 1. The proximity matrix is visualized using the multidimensional scaling method, a concept beyond the scope of this chapter, where the proximity matrix is represented in two dimensions. The proximity plots give an indication of which points are closer to each other from the perspective of the random forest.
In the earlier creation of random forests, we had not specified the option of a proximity matrix. Thus, we will first create the random forest using the option of proximity as follows:
> GC2_RF3 <- randomForest(GC2_Formula,data=GC2_Train, + ...