Comparing bagging, random forests, and boosting
We carried out comparisons between the bagging and random forest methods in the previous chapter. Using the gbm
function, we now add boosting accuracy to the earlier analyses:
> data("spam") > set.seed(12345) > Train_Test <- sample(c("Train","Test"),nrow(spam),replace = TRUE, + prob = c(0.7,0.3)) > head(Train_Test) [1] "Test" "Test" "Test" "Test" "Train" "Train" > spam_Train <- spam[Train_Test=="Train",] > spam_TestX <- within(spam[Train_Test=="Test",], + rm(type)) > spam_TestY <- spam[Train_Test=="Test","type"] > spam_Formula <- as.formula("type~.") > spam_rf <- randomForest(spam_Formula,data=spam_Train,coob=TRUE, + ntree=500,keepX=TRUE,mtry=5) > spam_rf_predict <- predict(spam_rf,newdata=spam_TestX,type="class") > rf_accuracy <- sum(spam_rf_predict==spam_TestY)/nrow(spam_TestX) > rf_accuracy [1] 0.9436117 > spam_bag <- randomForest...