Creating a SparkR standalone application from RStudio
In this recipe, we'll look at the process of writing and executing a standalone application in SparkR.
Getting ready
To step through this recipe, you will need a running Spark Cluster either in pseudo distributed mode or in one of the distributed modes, that is, standalone, YARN, or Mesos. Also, install RStudio. Please refer to the Installing R recipe for details on the installation of R.
How to do it…
In this recipe, we'll create standalone application using Spark-1.6.0 and Spark-2.0.2:
- Before working with SparkR, make sure that
SPARK_HOME
is set in environment as follows:if (nchar(Sys.getenv("SPARK_HOME")) < 1) { Sys.setenv(SPARK_HOME = "/home/padmac/bigdata/spark-1.6.0-bin- hadoop2.6") }
- Now, load the
SparkR
package and invokesparkR.init
as follows:library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"))) sc <- sparkR...