Installing Apache Spark
To use Apache Spark, we need to install it. The process is very easy because its requirements are not the traditional Hadoop ones that require Apache ZooKeeper and Hadoop Distributed File System (HDFS).
Apache Spark can work in a standalone node installation similar to Elasticsearch.
Getting ready
You need a Java virtual machine installed. Generally, version 8.x or above is used. The maximum Java version supported by Apache Spark is 11.x.
How to do it...
To install Apache Spark, we will perform the following steps:
- Download a binary distribution from https://spark.apache.org/downloads.html. For generic usage, I would suggest that you download a standard version using the following request:
wget https://dlcdn.apache.org/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz
- Now, we can extract the Spark distribution using
tar
, as follows:tar xfvz spark-3.2.1-bin-hadoop3.2.tgz
- Now, we can test whether Apache Spark is working by executing...