There are two main disadvantages to using decision trees. First, decision trees use algorithms that make a choice to split on an attribute based on a cost function. The decision tree algorithm is a greedy algorithm that optimizes toward a local optimum when making every decision regarding splitting the dataset into two subsets. However, it does not explore whether making a suboptimal decision while splitting over an attribute would lead to a more optimal decision tree in the future. Hence, we do not get a globally optimum tree when running this algorithm. Second, decision trees tend to overfit to the training data. For example, a small sample of observations available in the dataset may lead to a branch that provides a very high probability of a certain class event occurring. This leads to the decision trees being really good at generating...




















































