Streaming data
Streaming data is often a misunderstood topic as streaming is often thought of as being required for real-time data processing. This level of processing needs some type of compute resource to be running continuously to keep the data as close to up to date as possible and is thought of as being very expensive. Some engineering teams will avoid this architecture because of budgetary constraints, and because the use case only requires data to be fresh at some type of frequency, such as daily, hourly, twice a day, and so on. While this is true for many scenarios, it misses the main purpose of streaming architecture, which is incremental processing. This type of processing is the holy grail of data engineering because the less data that is processed typically means less cost is associated with a pipeline.
This section will show how to stream from different sources and process these streams into different destinations or sinks.
There are many different ways to set up...