Exploring decision trees
Decision Trees work by splitting the data into branches. The branches are followed down to leaves where predictions are made. Understanding how branches and leaves are created is much easier with a practical example. Before going into further detail, let's build our first decision tree model.
First decision tree model
We start by building a decision tree to predict whether someone makes over 50K US dollars using the Census dataset from Chapter 1, Machine Learning Landscape:
First, open a new Jupyter Notebook and start with the following imports:
import pandas as pd import numpy as np import warnings warnings.filterwarnings('ignore')
Next, open the file
'census_cleaned.csv'
that has been uploaded for you at https://github.com/PacktPublishing/Hands-On-Gradient-Boosting-with-XGBoost-and-Scikit-learn/tree/master/Chapter02. If you downloaded all files for this book from the Packt GitHub page, as recommended in the preface...