Advanced techniques in Structured Streaming
There are certain built-in capabilities of Structured Streaming that makes it the default choice for even some batch operations. Instead of architecting things yourself, Structured Streaming handles these properties for you. Some of them are as follows.
Handling fault tolerance
Fault tolerance is crucial in streaming systems to ensure data integrity and reliability. Structured Streaming provides built-in fault tolerance mechanisms to handle failures in both streaming sources and sinks:
- Source fault tolerance: Structured Streaming ensures end-to-end fault tolerance in sources, by tracking the progress of event time using watermarks and checkpointing the metadata related to the stream. If there are failures, the system can recover and resume processing from the last consistent state.
- Sink fault tolerance: Fault tolerance in sinks depends on the guarantees provided by the specific sink implementation. Some sinks may inherently...