Summary
We began with the idea of recursive partitioning and gave a legitimate reason why such an approach is practical. The CART technique is completely demystified by using the getNode
function, which has been defined appropriately, depending upon whether we require a regression or a classification tree. With the conviction behind us, we applied the rpart
function to the German credit data, and with its results, we basically had two problems.
First, the fitted classification tree appeared to overfit the data. This problem can often be overcome by using the minsplit
and cp
options. The second problem was that the performance was really poor in the validate region. Though the reduced classification trees had slightly better performance as compared to the initial tree, we still need to improve the classification tree.
The next chapter will focus more on this aspect and discuss the modern development of CART. The user can now develop decision trees using either of the two software programs...