Handling late-arriving data
Late-arriving data is a situation that is unique to real-time streaming analytics, where events related to the same transaction do not arrive in time to be processed together, or they arrive out of order at the time of processing. Structured Streaming supports stateful stream processing to handle such scenarios. We will explore these concepts further next.
Stateful stream processing using windowing and watermarking
Let's consider the example of an online retail transaction where a user is browsing through the e-tailer's website. We would like to calculate the user session based on one of the two following events taking place: either the users exit the e-tailer's portal or a timeout occurs. Another example is that a user places an order and then subsequently updates the order, and due to the network or some other delay, we receive the update first and then the original order creation event. Here, we would want to wait to receive any late...