Decision trees
In this section, we will use decision trees to predict values. A decision tree has a logical flow where the user makes decisions based on attributes following the tree down to a root level where a classification is then provided.
For this example, we are using automobile characteristics, such as vehicle weight, to determine whether the vehicle will produce good mileage. The information is extracted from the page at https://alliance.seas.upenn.edu/~cis520/wiki/index.php?n=Lectures.DecisionTrees. I copied the data out to Excel and then wrote it as a CSV for use in this example.
Decision trees in R
We load the libraries to use rpart
and caret
. rpart
has the decision tree modeling package. caret
has the data partition function:
library(rpart)
library(caret)
set.seed(3277)
We load in our mpg
dataset and split it into a training and testing set:
carmpg <- read.csv("car-mpg.csv")
indices <- createDataPartition(carmpg$mpg, p=0.75, list=FALSE)
training <- carmpg[indices,]
testing...