Apache Spark is a cross-platform framework, which can be deployed on Linux, Windows, and a Mac Machine as long as we have Java installed on the machine. In this section, we will look at how to install Apache Spark.
Apache Spark can be downloaded from http://spark.apache.org/downloads.html
First, let's look at the pre-requisites that must be available on the machine:
- Java 8+ (mandatory as all Spark software runs as JVM processes)
- Python 3.4+ (optional and used only when you want to use PySpark)
- R 3.1+ (optional and used only when you want to use SparkR)
- Scala 2.11+ (optional and used only to write programs for Spark)
Spark can be deployed in three primary deployment modes, which we will look at:
- Spark standalone
- Spark on YARN
- Spark on Mesos