Before we move on to exploring the entire Spark dataframe, we can look at some of the data already generated for positive cases. As you may recall from the prior chapter, this is stored in the Spark dataframe out_sd1.
We have generated some random sample bins specifically so that we can do some exploratory analysis.
We can use the filter command to extract random sample 1, and take the first 1,000 records:
- The filter is a SparkR command that allows you to subset a Spark dataframe
- The display command is a databricks command that is equivalent to the View command we have previously used and you can also use the head function as well to limit the number of rows that are displayed:
This code chunk extracts 1000 records from the positives and displays them:
small_pos <- head(SparkR::filter(out_sd1,out_sd1$sample_bin==1...