Technical requirements
The following are the technical requirements for this chapter:
- Python 3.7 or later installed on your computer
- An Apache Spark single-node cluster
- PySpark installed on top of Python 3.7 or later for driver program development
Note
The Python version used with Apache Spark has to match the Python version that is used to run the driver program.
The sample code for this chapter can be found at https://github.com/PacktPublishing/Python-for-Geeks/tree/master/Chapter08.
We will start our discussion by looking at the cluster options available for parallel processing in general.