Streaming is the act of working on an infinite dataset. This does not mean that it is, but it means that we have the possibility of having an unlimited data source. If we think in the traditional context of processing data, we usually run through three main steps:
- Open/get access to a data source.
- Process the data source once it is fully loaded in.
- Spit out computed data to another location.
We can think of this as the basics of input and output (I/O). Most of our concepts of I/O involve batch processing or working on all or almost all of the data. This means that we know the limits of that data ahead of time. We can make sure that we have enough memory, storage space, computing power, and so on, to deal with the process. Once we are done with the process, we kill the program or queue up the next batch of data.
A simple example of this is seen as...