Monitoring real-time data processing with Apache Spark Structured Streaming
In this recipe, you will learn how to do the following:
- Use the
status
andrecentProgress
attributes of a streaming query to get information about the input rate, processing rate, latency, state size, and more - Use the
StreamingQueryListener
API to register a custom listener that can handle events related to the start, progress, and termination of a streaming query
To monitor the performance and progress of your streaming queries, Structured Streaming provides various metrics and APIs that you can use to access them.
Getting ready
Before we start, we need to make sure that we have a Kafka cluster running and a topic that produces some streaming data. For simplicity, we will use a single-node Kafka cluster and a topic named users
. Open the 5.0 user-gen-kafka.ipynb
notebook and execute the cell. This notebook produces a user record every few seconds and puts it on a Kafka topic called...