Model training for prediction
Inside the project, in the package folder prediction.training
, there is a Scala object called TrainGBT.scala
. Before launching, you have to specify/change four things:
- In the code, you need to set up
spark.sql.warehouse.dir
in some actual place on your computer that has several gigabytes of free space:set("spark.sql.warehouse.dir", "/home/user/spark")
- The
RootDir
is the main folder, where all files and train models will bestored:rootDir = "/home/user/projects/btc-prediction/"
- Make sure that the
x
filename matches the one produced by the Scala script in the preceding step:x = spark.read.format("com.databricks.spark.csv ").schema(xSchema).load(rootDir + "scala_test_x.csv")
- Make sure that the
y
filename matches the one produced by Scala script:y_tmp=spark.read.format("com.databricks.spark.csv").schema(ySchema).load(rootDir + "scala_test_y.csv")
The code for training uses the Apache Spark ML library (and libraries required for it) to train the classifier, which means...