Looking into the query execution plan
It's important to understand the execution plan and how to view its different stages when the Spark optimizer executes a query using a dataframe or the SparkSQL API.
In this recipe, we will learn how to create a logical and physical plan and the different stages involved. By the end of this recipe, you will have generated an execution plan using the dataframe API or the SparkSQL API and have a fair understanding of the different stages involved.
Getting ready
You can follow along by running the steps in the 3-4.Query Execution Plan
notebook in your local cloned repository, which can be found in the Chapter03
folder (https://github.com/PacktPublishing/Azure-Databricks-Cookbook/tree/main/Chapter03).
Upload the csvFiles
folder in the Common/Customer
folder (https://github.com/PacktPublishing/Azure-Databricks-Cookbook/tree/main/Common/Customer/csvFiles) to your ADLS Gen-2 account. This can be found in the rawdata
filesystem, inside...