For this recipe, we will start creating an RDD by generating the data within the PySpark. To create RDDs in Apache Spark, you will need to first install Spark as shown in the previous chapter. You can use the PySpark shell and/or Jupyter notebook to run these code samples.
Creating RDDs
Getting readyÂ
We require a working installation of Spark. This means that you would have followed the steps outlined in the previous chapter. As a reminder, to start PySpark shell for your local Spark cluster, you can run this command:
./bin/pyspark --master local[n]
Where n is the number of cores.Â