Growing and pruning a classification tree
Let's start by examining the dataset one more time. We will first simplify our problem to a binary case so that the demonstration of decision tree growing is simpler. Let's examine Figure 10.1 again.
For the purpose of this demonstration, I will just group the middle-risk and high-risk patients into the high-risk group. This way, the classification problem becomes a binary classification problem, which is easier to explain. After going through this section, you can try the exercises on the original three-category problem for practice.
The following code snippet generates the new dataset that groups middle-risk and high-risk patients together:
df["stroke_risk"] = df["stroke_risk"].apply(lambda x: "low" if x == "low" else "high")
The new dataset will then look as follows:
Now, let's think about the...