Getting started with Spark
In this section, we will run Apache Spark in local mode or standalone mode. First we will set up Scala, which is the prerequisite for Apache Spark. After the Scala setup, we will set up and run Apache Spark. We will also perform some basic operations on it. So let's start.
Since Apache Spark is written in Scala, it needs Scala to be set up on the system. You can download Scala from http://www.scala-lang.org/download/ (we will set up Scala 2.11.8 in the following examples).
Once Scala is downloaded, we can set it up on a Linux system as follows:
Also, it is recommended to set the SCALA_HOME
environment variable and add Scala binaries to the PATH
variable. You can set it in the .bashrc
file or /etc/environment
file as follows:
export SCALA_HOME=/usr/local/scala-2.11.8 export PATH=$PATH:/usr/local/scala-2.11.8/bin
It is also shown in the following screenshot:
Now, we have set up a Scala environment successfully. So, it is time to download Apache Spark. You can download...