Installing Apache Spark
To use Apache Spark, first install it. The process is very easy, because its requirements are not the traditional Hadoop ones that require Apache Zookeeper and Hadoop HDFS.
Apache spark is able to work in a standalone node installation that is similar to an Elasticsearch one.
Getting ready
You need a Java Virtual Machine installed: generally version 8.x or above is used.
How to do it...
For installing Apache Spark, we will perform the following steps:
We will download a binary distribution from at http://spark.apache.org/downloads.html. For a generic usage, I suggest you to download a standard version via:
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin- hadoop2.7.tgz
Now we can extract the Spark distribution via:
tar xfvz spark-2.1.0-bin-hadoop2.7.tgz
Now, we can test if Apache Spark is working by executing a test:
cd spark-2.1.0-bin-hadoop2.7 ./bin/run-example SparkPi
The result will be similar...