Creating a solution using Databricks
Databricks is a platform for using Spark as a service. We do not need to provision master and worker nodes on virtual machines. Instead, Databricks provides us with a managed environment consisting of master and worker nodes and also manages them. We need to provide the steps and logic for the processing of data, and the rest is taken care of by the Databricks platform.
In this section, we will go through the steps of creating a solution using Databricks. We will be downloading sample data to analyze.
The sample CSV has been downloaded from https://ourworldindata.org/coronavirus-source-data, although it is also provided with the code of this book. The URL mentioned before will have more up-to-date data; however, the format might have changed, and so it is recommended to use the file available with the code samples of this book:
- The first step in creating a Databricks solution is to provision it from the Azure portal. There is a 14...