To begin, you need to install a few packages and the Spark itself. To do it, call the following codes; it can take some time to download Spark:
install.packages(c("dplyr", "sparklyr", "DAAG"))
library(sparklyr); library(dplyr)
#installing Spark
spark_install()
The DAAG package contains the dataset we are going to use. So, let's start our learning. This chapter is divided into five sections plus this introduction. The next section teaches you how to manipulate Spark data using dplyr and SQL query. In the second section, we bring Spark data into R, for analysis and visualization. The third section shows how to use the Spark or the H2O machine learning algorithms. The fourth section presents the Spark API. Lastly, there is a final section to see the Spark connection on RStudio IDE.