How transparent fault tolerance and exactly-once delivery guarantee is achieved
Apache Spark structured streaming supports full crash fault tolerance and exactly-once delivery guarantee without the user taking care of any specific error handling routines. Isn't this amazing? So how is this achieved?
Note
Full crash fault tolerance and exactly-once delivery guarantee are terms of systems theory. Full crash fault tolerance means that you can basically pull the power plug of the whole data center at any point in time, and no data is lost or left in an inconsistent state. Exactly-once delivery guarantee means, even if the same power plug is pulled, it is guaranteed that each tuple- end-to-end from the data source to the data sink - is delivered - only, and exactly, once. Not zero times and also not more than one time. Of course, those concepts must also hold in case a single node fails or misbehaves (for example- starts throttling).
First of all, states between individual batches and offset ranges...