Exploring random forests
To get a better sense of how random forests work, let's build one using scikit-learn.
Random forest classifiers
Let's use a random forest classifier to predict whether a user makes more or less than USD 50,000 using the census dataset we cleaned and scored in Chapter 1, Machine Learning Landscape, and revisited in Chapter 2, Decision Trees in Depth. We are going to use cross_val_score
to ensure that our test results generalize well:
The following steps build and score a random forest classifier using the census dataset:
Import
pandas
,numpy
,RandomForestClassifier
, andcross_val_score
before silencing warnings:import pandas as pd import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import cross_val_score import warnings warnings.filterwarnings('ignore')
Load the dataset
census_cleaned.csv
and split it intoX
(a predictor column) andy
(a target column):df_census = pd.read_csv...