Overviewing tree-based methods for classification tasks
Tree-based methods have two major varieties: classification trees and regression trees. A classification tree predicts categorical outcomes from a finite set of possibilities, while a regression tree predicts numerical outcomes. Let's first look at the classification tree, especially the quality that makes it more popular and easy to use compared to other classification methods, such as the simple logistic regression classifier and the naïve Bayes classifier.
A classification tree creates a set of rules and partitions the data into various subspaces in the feature space (or feature domain) in an optimal way.
First question, what is a feature space?
Let's take our stroke risk data that we used in Chapter 9, Statistics for Classification, as sample data. Here's the dataset from the previous chapter for your reference. Each row is a profile for a patient that records their weight, diet habit, smoking...