Handling streaming scenarios
In this section, we will see how to tackle common streaming requirements such as joining a stream with other data, some of which could be a mix of batch and streams; recovering from an intermittent failure scenario, which may involve restarting the stream after a period of inactivity; and handling late-arriving data, among other scenarios.
Joining with other static and dynamic datasets
Joins are a very common operation and streaming datasets are no exception. Very often, they need to be joined with other datasets, usually a slowly-changing dimension dataset to fortify the data for rich analytics. Let's consider an IoT use case where devices are being manufactured in lots and getting registered in a Delta lookup table. As the devices are deployed in the field, they start emitting sensor data that is streaming in nature and comes at a higher velocity. This IoT data needs to be joined with the device's lookup data. Storing device data in Delta...