One of the first things I do upon creating a new data object, is to run summary statistics. There is a Spark-specific function of the R summary function known as describe(). You can the specific function summary(); however, if you do this instead of using describe(), I would preface it with SparkR:: in order to specify which version of summary you are using:
head(SparkR::summary(out_sd))
The output appears in a slightly different format than if you ran a summary on a native R dataframe, but contains the basic measures that you are looking for, count, mean, stddev, min, and max:
We can also compare this summary with the summary of the original Pima Indians dataframe, and see that the simulation has done a pretty good job of estimating the means. The number of observations is approximately 1,000 times the original size and the ratio of diabetes to nondiabetes...