Case studies of using Apache Spark and PySpark
In previous sections, we covered the fundamental concepts and architecture of Apache Spark and PySpark. In this section, we will discuss two case studies for implementing two interesting and popular applications for Apache Spark.
Case study 1 – Pi (π) calculator on Apache Spark
We will calculate Pi (π) using the Apache Spark cluster that is running on our local machine. Pi is the area of a circle when its radius is 1. Before discussing the algorithm and the driver program for this application, it is important to introduce the Apache Spark setup used for this case study.
Setting up the Apache Spark cluster
In all previous code examples, we used PySpark locally installed on our machine without a cluster. For this case study, we will set up an Apache Spark cluster by using multiple virtual machines. There are many virtualization software tools available, such as VirtualBox, and any of these software tools will...