Understanding decision trees
Decision trees are supervised models that can either perform regression or classification. They are a flowchart-like structure in which each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label (for classification) or a value (for regression). One of the primary advantages of decision trees is their simplicity; they do not require any complex mathematical formulations, making them easier to understand and visualize.
The goal of a decision tree is to split the data in a manner that maximizes the purity of the nodes resulting from those splits. In the context of a classification problem, “purity” refers to how homogeneous the nodes are with respect to the target variable. A perfectly pure node would contain instances of only a single class.
Decision trees achieve this by using measures of impurity, such as the Gini index or entropy (more on that soon...