Increased performance with good old friends
As in Apache SparkSQL for batch processing and, as Apache Spark structured streaming is part of Apache SparkSQL, the Planner (Catalyst) creates incremental execution plans as well for mini batches. This means that the whole streaming model is based on batches. This is the reason why a unified API for streams and batch processing could be achieved. The price we pay is that Apache Spark streaming sometimes has drawbacks when it comes to very low latency requirements (sub-second, in the range of tens of ms). As the name Structured Streaming and the usage of DataFrames and Datasets implies, we are also benefiting from performance improvements due to project Tungsten, which has been introduced in a previous chapter. To the Tungsten engine itself, a mini batch doesn't look considerably different from an ordinary batch. Only Catalyst is aware of the incremental nature of streams. Therefore, as of Apache Spark V2.2, the following operations are not (yet...