Monitoring Spark jobs in the Spark UI
The Spark UI can be used to track the progress and performance of your Spark cluster and its applications. The Spark UI web-based interfaces show you the status and resource usage of your cluster, as well as the details of your Spark jobs, stages, tasks, and SQL queries. The Spark UI is a helpful tool for debugging and optimizing your Spark applications.
In this recipe, we will see how to monitor your Spark jobs in the Spark UI using an example application that reads a CSV file, infers its schema, filters some rows, groups by a column, and counts the number of groups:
How to do it…
- Run the Spark application: Execute the following code to run a sample Spark application that will read a CSV file into a Spark DataFrame with a specific schema, then filter using
release_year
and group bycountry
, and finally, display the DataFrame:from pyspark.sql import SparkSession
# Create a new SparkSession
spark = (SparkSession
...