Decision tree classification with scikit-learn
scikit-learn contains the DecisionTreeClassifier
class, which can train a binary decision tree with Gini and cross-entropy impurity measures. In our example, let's consider a dataset with three features and three classes:
from sklearn.datasets import make_classification >>> nb_samples = 500 >>> X, Y = make_classification(n_samples=nb_samples, n_features=3, n_informative=3, n_redundant=0, n_classes=3, n_clusters_per_class=1)
Let's first consider a classification with default Gini impurity:
from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import cross_val_score >>> dt = DecisionTreeClassifier() >>> print(cross_val_score(dt, X, Y, scoring='accuracy', cv=10).mean()) 0.970
A very interesting feature is given by the possibility of exporting the tree in Graphviz
format and converting it into a PDF.
Note
Graphviz is a free tool that can be downloaded from http://www.graphviz.org.
To export a...