Running Spark standalone
Spark can be executed in various modes. To get started, we are going to take a look at how to install Apache Spark on a standalone machine.
Getting ready
To perform this recipe, you should download the latest version of Spark. For this recipe, I am using Apache Spark 1.6.0. You can visit the download page at http://spark.apache.org/downloads.html.
How to do it...
Apache Spark is a computation engine. It has a built-in cluster manager. It can also use other cluster managers such as YARN/Mesos and so on. In this recipe, we are going to use the built-in resource manager that's provided by Spark:
Copy the downloaded Spark binary to a desired location.
Extract the tar ball:
$sudo tar –xzfspark-1.6.0-bin-hadoop2.6.tgz
Rename the
spark
folder for ease of use:$sudo mv spark-1.6.0-bin-hadoop2.6 spark
Add environment variables in
~/.bashrc
:export SPARK_HOME=/usr/local/spark export PATH=$PATH:$SPARK_HOME/bin
Source
~/.bashrc
to make the changes effective:$source ~/.bashrc
In case you...