Now that we have learned how to evaluate the model's accuracy more reliably using the ShuffleSplit cross-validation method, it is time to test our earlier hypothesis: would a smaller tree be more accurate?
Here is what we are going to do in the following sub sections:
- Split the data into training and test sets.
- Keep the test side to one side now.
- Limit the tree's growth using different values of max_depth.
- For each max_depth setting, we will use the ShuffleSplit cross-validation method on the training set to get an estimation of the classifier's accuracy.
- Once we decide which value to use for max_depth, we will train the algorithm one last time on the entire training set and predict on the test set.
Splitting the data
Here is the usual code for splitting the data into training and test sets:
from sklearn.model_selection...