Decision Trees can also be employed in order to solve regression problems. However, in this case, it's necessary to consider a slightly different way of splitting the nodes. Instead of considering an impurity measure, one of the most common choices is to pick the feature that minimizes the mean squared error (MSE), considering the average prediction of a node. Let's suppose that a node, i, contains m samples. The average prediction is as follows:
At this point, the algorithm has to look for all of the binary splits in order to find the one that minimizes the target function:
Analogous to classification trees, the procedure is repeated until the MSE is below a fixed threshold, λ. Even if it's not correct, we can think about an unacceptable impurity level when the prediction of a node has a low accuracy. In fact, in a classification...