Estimating forest elevation
In this recipe, we will build two regression models that will predict forest elevation: the random forest regression model and the gradient-boosted trees regressor.
Getting ready
To execute this recipe, you will need a working Spark environment and you would have already loaded the data into the forest
DataFrame.
No other prerequisites are required.
How to do it...
In this recipe, we will only build a two stage Pipeline with the .VectorAssembler(...)
and the .RandomForestRegressor(...)
stages. We will skip the feature selection stage as it is not currently an automated process.
Note
You can do this manually. Just check the Selecting the most predictable features recipe earlier from in this chapter.
Here's the full code:
vectorAssembler = feat.VectorAssembler( inputCols=forest.columns[1:] , outputCol='features') rf_obj = rg.RandomForestRegressor( labelCol='Elevation' , maxDepth=10 , minInstancesPerNode=10 , minInfoGain=0.1 , numTrees=10 ) ...