To stream or not to stream
Streams are datasets that continuously update as each new data message arrives with little to no latency. Streaming analytics operate on this continuously updating dataset at much shorter intervals than batch processing. Real-time analytics is a little bit of a misnomer when applied to streaming analytics as intervals are typically in minutes rather than continuously ongoing. The frequency affects processing and technology requirements, so intervals should be set for longer time periods if possible in order to save costs.
Stream datasets normally keep data for a window of time, and then discard it. There are specialized technology and processing options to handle streams, which are, for the most part, in addition to requirements for long term big data store technology we have focused on in this chapter. Amazon Kinesis is an example of a specialized data streaming technology service.
The technology and the programming code base needed to support analytics are (usually...