Monitoring applications
Spark Streaming jobs produce useful information for understanding the current state of the application. Broadly, there are two ways to monitor Spark Streaming jobs: using the UI and using external tools.
The Spark UI HTTP address is http://driver-host-name:4040/
. When multiple SparkContexts run at the same time, they are bound to successive ports like 4041, 4042, and so on. The Spark UI provides useful information like event timeline and DAG visualizations as explained in Chapter 3, Deep Dive into Apache Spark. When a Spark Streaming application is running, a streaming tab appears on the UI, which provides information such as the number of batches completed, number of records processed, batch window time, total time of Spark Streaming application, input rate, scheduling delay, processing time, and total delay. The UI also shows the Kafka topic name, partition numbers, and offsets processed in a batch when using the Kafka direct API. This is really helpful and easy...