Random forests
The random forests algorithm attempts a number of random decision trees and provides the tree that works best within the parameters used to drive the model.
Random forests in R
With R we include the packages we are going to use:
install.packages("randomForest", repos="http://cran.r-project.org")
library(randomForest)
Load the data:
filename = "http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data"
housing <- read.table(filename)
colnames(housing) <- c("CRIM", "ZN", "INDUS", "CHAS", "NOX",
"RM", "AGE", "DIS", "RAD", "TAX", "PRATIO",
"B", "LSTAT", "MDEV")
Split it up:
housing <- housing[order(housing$MDEV),]
#install.packages("caret")
library(caret)
set.seed(5557)
indices <- createDataPartition(housing$MDEV, p=0.75, list=FALSE)
training <- housing[indices,]
testing <- housing[-indices,]
nrow(training)
nrow(testing)
Calculate our model:
forestFit <- randomForest(MDEV ~ CRIM ...