Once you're done with the installation from this chapters introduction, let's create a remote dplyr data source for the Spark cluster. To do this, use the spark_connect function, as shown:
sc <- spark_connect(master = "local")
This will create a Spark cluster in your computer; you can see it at your RStudio, a tab guide alongside your R environment. To disconnect, use the spark_disconnect(sc) function. Keep connected and copy a couple of datasets from any R packages into the cluster:
library(DAAG)
dt_sugar <- copy_to(sc, sugar, "SUGAR")
dt_stVincent <- copy_to(sc, stVincent, "STVINCENT")
The preceding code uploads the DAAG::sugar and DAAG::stVicent DataFrames into the your connected Spark cluster. It also creates the table definitions; they were saved into dt_sugar and dt_stVincent...