Fitting a random forest model
We will use the combined dataset we now have to perform preliminary tests and fit the model. To run these tests and eventually the model, the sklearn
module will need to be installed using pip
via the command prompt (or within the Notebook):
- In the next cell, install the
sklearn
module and import the following (note that thesklearn
module may already be installed):pip install sklearn from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import r2_score
Run the cell.
Before running any tests, note that the random forest model only accepts numeric variables, meaning all the categorical variables – specifically the 'Item'
field – will need to be changed to numeric. Essentially, each value will be represented by a number. Some cleanup needs to be completed as well, removing null values and dropping columns that are redundant...