Apache Spark
One of the tools we will be using is Apache Spark. Spark is an open source toolset for cluster computing. While we will not be using a cluster, typical usage for Spark is a larger set of machines or clusters that operate in parallel to analyze a big dataset. Installation instructions are available at https://www.dataquest.io/blog/pyspark-installation-guide.
Installing Spark on macOS
Up-to-date instructions for installing Spark are available at https://medium.freecodecamp.org/installing-scala-and-apache-spark-on-mac-os-837ae57d283f. The main steps are:
- Get Homebrew from http://brew.sh. If you are doing software development on macOS, you will likely already have Homebrew.
- Install
xcode-select
:xcode-select
is used for different languages. For Spark we use Java, Scala and, of course, Spark as follows:
xcode-select -install
Again, it is likely that you will already have this for other software development tasks.
- Use Homebrew to install Java:
brew cask install java
- Use Homebrew to install...