The examples shown in this chapter can be made scalable for the even larger dataset to serve different purposes. You can package all three clustering algorithms with all the required dependencies and submit them as a Spark job in the cluster. Now use the following lines of code to submit your Spark job of K-means clustering, for example (use similar syntax for other classes), for the Saratoga NY Homes dataset:
# Run application as standalone mode on 8 cores
SPARK_HOME/bin/spark-submit \
--class org.apache.spark.examples.KMeansDemo \
--master local[8] \
KMeansDemo-0.1-SNAPSHOT-jar-with-dependencies.jar \
Saratoga_NY_Homes.txt
# Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
SPARK_HOME/bin/spark-submit \
--class org.apache.spark.examples.KMeansDemo \
--master yarn \
--deploy-mode cluster \ # can be client for client mode...