Sample questions
Question 1:
Which of the following code blocks returns a DataFrame showing the mean of the salary
column of the df
DataFrame, grouped by the department
column?
df.groupBy("department").agg(avg("salary"))
df.groupBy(col(department).avg())
df.groupBy("department").avg(col("salary"))
df.groupBy("department").agg(average("salary"))
Question 2:
Which of the following code blocks returns unique values across all values in the state
and department
columns in df
?
df.select(state).join(transactionsDf.select('department'),
col(state)==col('department'), 'outer').show()
df.select(col('state'),
col('department')).agg({'*': 'count'}).show()
df.select('state', 'department').distinct().show()
df.select('state').union(df.select('department')).distinct().show()
...