Spark SQL optimizer's heuristics rules can transform a SELECT statement into a query plan with the following characteristics:
- The filter operator and project operator are pushed down below the join operator, that is, both the filter and project operators are executed before the join operator
- Without subquery block, the join operator is pushed down below the aggregate operator for a select statement, that is, a join operator is usually executed before the aggregate operator
With this observation, the biggest benefit we can get from CBO is multi-way join ordering optimization. Using a dynamic programming technique, we try to get the globally optimal join order for a multi-way join query.
For more details on multi-way join reordering in Spark 2.2, refer to https://spark-summit.org/2017/events/cost-based-optimizer-in-apache...