Following the Spark UI
The Spark UI is a web-based user interface that's used to monitor Spark jobs and is very helpful for optimizing workloads. In this section, we will learn about the major components of the Spark UI. To begin with, let's create a new Databricks cluster with the following configuration:
- Cluster Mode: Standard
- Databricks Runtime Version:
8.3 (includes Apache Spark 3.1.1, Scala 2.12)
- Autoscaling: Disabled
- Automatic Termination: After
30
minutes of inactivity - Worker Type:
Standard_DS3_v2
- Number of workers:
1
- Spot instances: Disabled
- Driver Type:
Standard_DS3_v2
Once the cluster has started, create a new Databricks Python notebook. Next, let's run the following code block in a new cell:
from pyspark.sql.functions import * # Define the schema for reading streaming schema = "time STRING, action STRING" # Creating a streaming dataframe stream_read = (spark ...