Let's begin with the Spark setup on our local machine. Please refer to the Apache Spark official site (https://spark.apache.org/downloads.html) for details on downloading and installing Spark. At the time of this writing, Spark Version 2.4.0 is the latest version, so we will download and install this version. The following is a screenshot the Spark download web page:
Please note the following pieces of information when selecting an appropriate download image:
- Spark release: Choose the latest stable release (2.4.0 at the time of this writing).
- Package type: Choose the Pre-built for Apache Hadoop 2.7 and later option.
- Download link: Clicking on this link will take you to the Apache Download Mirrors site. Select the site most suitable for you. Generally, the suggested mirror site works for most cases.
The following is another screenshot of the Apache...