Predicting class membership on synthetic 2D data
Our first example showcasing tree-based methods in R will operate on a synthetic dataset that we have created. The dataset can be generated using commands in the companion R file for this chapter, available from the publisher. The data consists of 287 observations of two input features, x1
and x2
.
The output variable is a categorical variable with three possible classes: a
, b
, and c
. If we follow the commands in the code file, we will end up with a data frame in R, mcdf
:
> head(mcdf, n = 5) x1 x2 class 1 18.58213 12.03106 a 2 22.09922 12.36358 a 3 11.78412 12.75122 a 4 23.41888 13.89088 a 5 16.37667 10.32308 a
This problem is actually very simple because, on the one hand, we have a very small dataset with only two features, and on the other the classes happen to be quite well separated in the feature space, something that is very rare. Nonetheless, our objective in this section is to demonstrate the construction...