SparkR DataFrame operations
SparkR DataFrames support a number of operations to do structured data processing. In this recipe, we'll see a good number of examples, such as selection, grouping, aggregation, and so on.
Getting ready
To step through this recipe, you will need a running Spark Cluster either in pseudo distributed mode or in one of the distributed modes, that is, standalone, YARN, or Mesos. Also, install RStudio. Please refer to the Installing R recipe for details on the installation of R and the Creating SparkR DataFrames recipe to get acquainted with the creation of DataFrames from a variety of data sources.
How to do it…
In this recipe, we'll see how to perform various operations SparkR data frames:
- Let's see how to select a column from a DataFrame:
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"))) sc <- sparkR.init(master = "local[*]", sparkEnvir = list(spark.driver...