Learning how to aggregate, group, and join data
Another set of basic skills a data engineer needs is the ability to aggregate, group, and join data together. Let’s learn how to do this using Scala in Spark!
val numDirectorsByShow: DataFrame = dfDirectorByShowSelectExpr .groupBy($"show_id") .agg( count($"director").alias("num_director") ) numDirectorsByShow.show(10, 0)
Here is the output:
+-------+------------+ |show_id|num_director| +-------+------------+ |s1 |1 | |s2 |1 | |s3 |1 | |s4 |1 &...