Training for bisecting K-means in Spark ML involves taking an approach similar to the other models -- we pass a DataFrame that contains our training data to the fit method of the KMeans object. Note that here we use the libsvm data format:
- Instantiate the cluster object:
val spConfig = (new
SparkConf).setMaster("local[1]").setAppName("SparkApp").
set("spark.driver.allowMultipleContexts", "true")
val spark = SparkSession
.builder()
.appName("Spark SQL Example")
.config(spConfig)
.getOrCreate()
val datasetUsers = spark.read.format("libsvm").load(
BASE + "/movie_lens_2f_users_libsvm/part-00000")
datasetUsers.show(3)
The output of the command show(3) is shown here:
...