Decision trees
The if a: else b
statement is one of the most common statements in Python programming. By nesting and combining such statements, we can build a so-called decision tree. This is similar to an old-fashioned flowchart, although flowcharts also allow loops. The application of decision trees in machine learning is called decision tree learning. The end nodes of the trees in decision tree learning, also known as leaves, contain the class labels of a classification problem. Each non-leaf node is associated with a Boolean condition involving feature values. The scikit-learn implementation uses Gini impurity and entropy as information metrics. These metrics measure the probability that an item is misclassified (see http://en.wikipedia.org/wiki/Decision_tree_learning). Decision trees are easy to understand, use, visualize, and verify. To visualize the tree, we will make use of Graphviz, which can be downloaded from http://graphviz.org/. We also need to install pydot2, as follows:
$...