Processing data using notebooks
Databricks notebooks are the fundamental component in Databricks for performing data processing tasks. In this recipe, we will perform operations such as reading, filtering, cleaning a Comma-Separated Value (CSV) file, and gaining insights from it using a Databricks notebook written in Scala code.
Getting ready
Create a Databricks workspace and a cluster, as explained in the Configuring the Azure Databricks environment recipe.
Download the covid-data.csv
file from the path at https://github.com/PacktPublishing/Azure-Data-Engineering-Cookbook-2nd-edition/blob/main/chapter07/covid-data.csv.
How to do it…
Let’s process some data using Scala in a Databricks notebook by following the steps provided here:
- Log in to portal.azure.com. Go to All resources and find pactadedatabricks, the Databricks workspace created in the Configuring the Azure Databricks environment recipe. Click Launch Workspace to log in to the Databricks...