Visualizing Spark application execution
In this section, we will present the key details of the SparkUI interface, which is indispensable for tuning tasks. There are several approaches to monitoring Spark applications, for example, using web UIs, metrics, and external instrumentation. The information displayed includes a of scheduler stages and tasks, a summary of RDD sizes and memory usage, environmental information, and information about the running executors.
This interface can be accessed by simply opening http://<driver-node>:4040
(http://localhost:4040
) in a web browser. Additional SparkContexts
running on the host bind to successive ports: 4041, 4042, and so on.
Note
For a more detailed coverage of monitoring and instrumentation in Spark, refer to https://spark.apache.org/docs/latest/monitoring.html.
We will explore Spark SQL execution visually using two examples. First, we create the two sets of Datasets. The difference between the first set (t1
, t2
, and t3
) and the second set...