Moving toward real-time systems
As their names suggest, batch is a form of periodic ingestion of data, whereas streaming is a process where data ingestion is either continuous or in micro-batches. There is no denying that the trend is toward the real-time ingestion, analysis, and consumption of data. This gives rise to the question of why is every pipeline not a streaming one?
There may be several producers of data for the same target table. Some may be fast-moving, while others could be slower. If the nature of your data is such that it comes once a month, then we certainly do not want to have compute running more frequently than once a month from a cost savings perspective. Hence, some folks may say that cases such as these force us to have batch ingestion. In this chapter, we will present an argument to justify that batch is actually a type of streaming workload and that all workloads can be expressed as a streaming pipeline. You may argue, 'Isn't streaming more complex...