Learning how to aggregate, group, and join data
Another set of basic skills a data engineer needs is the ability to aggregate, group, and join data together. Let’s learn how to do this using Scala in Spark!
val numDirectorsByShow: DataFrame =   dfDirectorByShowSelectExpr     .groupBy($"show_id")     .agg(       count($"director").alias("num_director")     ) numDirectorsByShow.show(10, 0)
Here is the output:
+-------+------------+ |show_id|num_director| +-------+------------+ |s1Â Â Â Â Â |1Â Â Â Â Â Â Â Â Â Â Â | |s2Â Â Â Â Â |1Â Â Â Â Â Â Â Â Â Â Â | |s3Â Â Â Â Â |1Â Â Â Â Â Â Â Â Â Â Â | |s4Â Â Â Â Â |1Â Â Â Â &...