Configuring Checkpoints and Watermarking
ASA, Event Hubs, and Spark offer checkpointing and watermarking options to ensure fault tolerance and manage event time handling. The different platforms are explained as follows:
- Checkpoints track the progress of data processing, allowing you to resume from the point of failure or interruption. Azure Event Hubs uses checkpoints to save the state of activities in your pipeline. By defining a storage account for checkpoint data, you ensure recovery after failures and minimize downtime.
- Watermarking marks progress within a specific column (e.g., timestamp). It is useful for incremental data updates, where you process only newly added or modified records. By configuring a watermark column, you identify the latest processed record during subsequent runs.
Checkpointing in ASA
ASA does internal checkpointing periodically and users do not need to do explicit checkpointing. The checkpointing process is used for job recoveries during...