Using the Cluster Launch Scripts to Start a Standalone Cluster
We have seen how easy it was to launch a cluster with minimal effort. But, the cluster that we had set up was a relatively smallish cluster with only 5 worker nodes. Imagine, setting up a 5000 nodes cluster following the above steps. This will neither be easy nor maintainable especially in terms of adding/removing nodes from this cluster. Spark therefore provides options by which you can make sure you don't have to manually perform such configuration.
Before running any scripts you have to make sure that the workers are accessible from the master via SSH. You will need to provide either of the following:
- Password-less ssh between the master and the workers.
-
Set
SPARK_SSH_FOREGROUND
option - Serially provide password for each worker.
In addition to the SSH configuration, you will need to create a configuration file in the conf
directory called slaves, where you will enter the names of nodes which should be used as workers...