We will examine the California housing dataset with gradient boosting trees. Our overall approach will be the same as before:
- Focus on important parameters in the gradient boosting algorithm:
- max_features
- max_depth
- min_samples_leaf
- learning_rate
- loss
- Create a parameter distribution where the most important parameters are varied.
- Perform a random grid search. If using an ensemble, keep the number of estimators low at first.
- Use the best parameters from the previous step with many estimators.