Running Spark on YARN
In the previous recipe, we took a look at how to use Spark's built-in cluster manager; in this recipe, we are going to explore how to use YARN as a cluster manager to execute the Spark application.
Getting ready
To perform this recipe, you should have a running Hadoop cluster. You should also have performed the previous recipe.
How to do it...
As mentioned in the previous recipe, we can either use Spark's built-in cluster manager, or we can use an external cluster manager such as YARN. In order to execute the Spark application on YARN, we need to edit SPARK_HOME/conf/spark-env.sh
, and add the following properties to it:
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop export YARN_CONF_DIR=/usr/local/hadoop/etc/Hadoop
Here, /usr/local/hadoop/etc/hadoop
is the directory where we have our Hadoop and YARN configuration files.
Now, let's execute the same Spark application on YARN using the following command:
./bin/spark-submit --class org.apache.spark.examples.SparkPi \ ...