Spark is an Apache project that provides an open source computing framework specially geared toward cluster computing. For our purposes, it provides a language called Spark that can be used to access Hadoop information sets.
Adding the Spark engine
How to do it...
We install the Spark engine and execute a Spark Jupyter script to show its working, as follows.
Installing the Spark engine
Generally, installing Spark involves two steps:
- Installing Spark (for your environment)
- Connecting Spark to your environment (whether standalone or clustered)
The Spark installations are...