Building a classification system with Gradient Boosted Trees (GBT) in Spark 2.0
In this recipe, we will explore the Gradient Boosted Tree (GBT) classification in Spark. The GBT requires more care with hyper-parameters and tries before deciding the final outcome. One must remember that it is completely OK to grow shorter trees if using GBT.
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
- Set up the package location where the program will reside:
package spark.ml.cookbook.chapter10
- Import the necessary packages for the Spark context:
import org.apache.spark.mllib.evaluation.MulticlassMetrics import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.tree.model.GradientBoostedTreesModel import org.apache.spark.rdd.RDD import org.apache.spark.mllib.tree.GradientBoostedTrees import org.apache.spark.mllib.tree.configuration.BoostingStrategy import org.apache...