Deep diving into schema inference
In this recipe, you will learn about the benefits of explicitly specifying a schema while reading any file format data from an ADLS Gen-2 or Azure Blob storage account.
By the end of this recipe, you will have learned how Spark executes a query when a schema is inferred versus explicitly specified.
Getting ready
You need to ensure you have done the following before you start working on this recipe:
- An ADLS Gen-2 account mounted.
- Follow along by running the steps in the
3-3.Schema Inference
notebook in your local cloned repository. This can be found in theChapter03
folder (https://github.com/PacktPublishing/Azure-Databricks-Cookbook/tree/main/Chapter03). - Upload the
csvFiles
folder in theCommon/Customer
folder (https://github.com/PacktPublishing/Azure-Databricks-Cookbook/tree/main/Common/Customer/csvFiles) to your ADLS Gen-2 account in therawdata
filesystem, inside theCustomer
folder.
How to do it…
You can...