This section walks through the steps in the recipe to install Python 3, Anaconda, and Spark on Ubuntu Desktop:
- Install Java on Ubuntu through the terminal application, which can be found by searching for the app and then locking it to the launcher on the left-hand side, as seen in the following screenshot:
- Perform an initial test for Java on the virtual machine by executing the following command at the terminal:
java -version
- Execute the following four commands at the terminal to install Java:
sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
- After accepting the necessary license agreements for Oracle, perform a secondary test of Java on the virtual machine by executing java -version once again in the terminal. A successful installation for Java will display the following outcome in the terminal:
$ java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
- Next, install the most recent version of Anaconda. Current versions of Ubuntu Desktop come preinstalled with Python. While it is convenient that Python comes preinstalled with Ubuntu, the installed version is for Python 2.7, as seen in the following output:
$ python --version
Python 2.7.12
- The current version of Anaconda is v4.4 and the current version of Python 3 is v3.6. Once downloaded, view the Anaconda installation file by accessing the Downloads folder using the following command:
$ cd Downloads/
~/Downloads$ ls
Anaconda3-4.4.0-Linux-x86_64.sh
- Once in the Downloads folder, initiate the installation for Anaconda by executing the following command:
~/Downloads$ bash Anaconda3-4.4.0-Linux-x86_64.sh
Welcome to Anaconda3 4.4.0 (by Continuum Analytics, Inc.)
In order to continue the installation process, please review the license agreement.
Please, press ENTER to continue
- Once the Anaconda installation is complete, restart the Terminal application to confirm that Python 3 is now the default Python environment through Anaconda by executing python --version in the terminal:
$ python --version
Python 3.6.1 :: Anaconda 4.4.0 (64-bit)
- The Python 2 version is still available under Linux, but will require an explicit call when executing a script, as seen in the following command:
~$ python2 --version
Python 2.7.12
- Visit the following website to begin the Spark download and installation process:
https://spark.apache.org/downloads.html
- Select the download link. The following file will be downloaded to the Downloads folder in Ubuntu:
spark-2.2.0-bin-hadoop2.7.tgz
- View the file at the terminal level by executing the following commands:
$ cd Downloads/
~/Downloads$ ls
spark-2.2.0-bin-hadoop2.7.tgz
- Extract the tgz file by executing the following command:
~/Downloads$ tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz
- Another look at the Downloads directory using ls shows both the tgz file and the extracted folder:
~/Downloads$ ls
spark-2.2.0-bin-hadoop2.7 spark-2.2.0-bin-hadoop2.7.tgz
- Move the extracted folder from the Downloads folder to the Home folder by executing the following command:
~/Downloads$ mv spark-2.2.0-bin-hadoop2.7 ~/
~/Downloads$ ls
spark-2.2.0-bin-hadoop2.7.tgz
~/Downloads$ cd
~$ ls
anaconda3 Downloads Pictures Templates
Desktop examples.desktop Public Videos
Documents Music spark-2.2.0-bin-hadoop2.7
- Now, the spark-2.2.0-bin-hadoop2.7 folder has been moved to the Home folder, which can be viewed when selecting the Files icon on the left-hand side toolbar, as seen in the following screenshot:
- Spark is now installed. Initiate Spark from the terminal by executing the following script at the terminal level:
~$ cd ~/spark-2.2.0-bin-hadoop2.7/
~/spark-2.2.0-bin-hadoop2.7$ ./bin/pyspark
- Perform a final test to ensure Spark is up and running at the terminal by executing the following command to ensure that the SparkContext is driving the cluster in the local environment:
>>> sc
<SparkContext master=local[*] appName=PySparkShell>