Configuring triggers for Structured Streaming in Apache Spark
In this recipe, we will learn how to configure triggers for Structured Streaming in Apache Spark. A trigger is an event that kicks off the processing of a micro-batch in a streaming system. In Spark Structured Streaming, we can configure different types of triggers to control the frequency and timing of micro-batches. To specify the trigger in a streaming query, we use the trigger
option. Let’s see how to do that with some examples.
Getting ready
Before we start, we need to make sure that we have a Kafka cluster running and a topic that produces some streaming data. For simplicity, we will use a single-node Kafka cluster and a topic named users
. Open the 4.0 user-gen-kafka.ipynb
notebook and execute the cell. This notebook produces a user record every few seconds and puts it on a Kafka topic called users
.
Make sure you have run this notebook and that it is producing records as shown here: