In this chapter, we showed some examples of how to write your Spark code in Python and R. These are the most popular programming languages in the data scientist community.
We covered the motivation of using PySpark and SparkR for big data analytics with almost similar ease with Java and Scala. We discussed how to install these APIs on their popular IDEs such as PyCharm for PySpark and RStudio for SparkR. We also showed how to work with DataFrames and RDDs from these IDEs. Furthermore, we discussed how to execute Spark SQL queries from PySpark and SparkR. Then we also discussed how to perform some analytics with visualization of the dataset. Finally, we saw how to use UDFs with PySpark with examples.
Thus, we have discussed several aspects for two Spark's APIs; PySpark and SparkR. There are much more to explore. Interested readers should refer to their websites for...