Exactly-once delivery is the holy grail of streaming analytics. Having duplicates of events processed in a streaming job is inconvenient and often undesirable, depending on the nature of the application. For example, if billing applications miss an event or process an event twice, they could lose revenue or overcharge customers. Guaranteeing that such scenarios never happen is difficult; any project seeking such a property will need to make some choices with respect to availability and consistency. One main difficulty stems from the fact that a streaming pipeline might have multiple stages, and exactly-once delivery needs to happen at each stage. Another difficulty is that intermediate computations could potentially affect the final computation. Once results are exposed, retracting them causes problems.
It is useful to provide exactly-once guarantees...