Inside the project, in the package folder prediction.training, there is a Scala object called TrainGBT.scala. Before launching, you have to specify/change four things:
- In the code, you need to set up spark.sql.warehouse.dir in some actual place on your computer that has several gigabytes of free space: set("spark.sql.warehouse.dir", "/home/user/spark")
- The RootDir is the main folder, where all files and train models will be stored:rootDir = "/home/user/projects/btc-prediction/"
- Make sure that the x filename matches the one produced by the Scala script in the preceding step: x = spark.read.format("com.databricks.spark.csv ").schema(xSchema).load(rootDir + "scala_test_x.csv")
- Make sure that the y filename matches the one produced by Scala script: y_tmp=spark.read...