Implementing a regression forest
In Chapter 3, Predicting Online Ad Click-Through with Tree-Based Algorithms, we explored random forests as an ensemble learning method, by combining multiple decision trees that are separately trained and randomly subsampling training features in each node of a tree. In classification, a random forest makes a final decision by a majority vote of all tree decisions. Applied to regression, a random forest regression model (also called a regression forest) assigns the average of regression results from all decision trees to the final decision.
Here, we will use the regression forest package, RandomForestRegressor
, from scikit-learn and deploy it in our California house price prediction example:
>>> from sklearn.ensemble import RandomForestRegressor
>>> regressor = RandomForestRegressor(n_estimators=100,
max_depth=10,
min_samples_split=3,
...