In this recipe, we will explore Random Forest implementation in Spark. We will use the Random Forest technique to solve a discrete classification problem. We found random forest implementation very fast due to Spark's exploitation of parallelism (growing many trees at once). We also do not need to worry too much about the hyper-parameters and technically we can get away with just setting the number of trees.
Building a classification system with Random Forest Trees in Spark 2.0
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter10
- Import the necessary...