Regularization of Random Forest
A Random Forest algorithm shares many hyperparameters with decision trees since a Random Forest is made up of trees. But a few more hyperparameters do exist, so in this recipe, we will present them and show how to use them to improve results on the California housing dataset regression.
Getting started
Random Forests are known to be quite prone to overfitting. Even if it’s not a formal proof, in the previous recipe, we were indeed facing quite strong overfitting. But Random Forests, like decision trees, have many hyperparameters allowing us to try to reduce overfitting. As for a decision tree, we can use the following hyperparameters:
- Maximum depth
- Minimum samples per leaf
- Minimum samples per split
max_features
max_leaf_nodes
min_impurity_decrease
But some other hyperparameters can be fine-tuned too:
n_estimators
: This is the number of decision trees trained in the random forest.max_samples...