Processing streaming data
In the big data era, people like to correlate big data with real-time data. Some people say that if the data is not real time, then it’s not big data. This statement is partially true. In reality, the majority of data pipelines in the world use the batch approach, and that’s why it’s still very important for data engineers to understand the batch data pipeline. From Chapter 3, Building a Data Warehouse on BigQuery, to Chapter 5, Building a Data Lake Using Dataproc, we focused on handling batch data pipelines.
However, real-time capabilities in the big data era are something that many data engineers need to start to rethink in terms of data architecture. To understand more about architecture, we need to have a clear definition of what real-time data is.
From the end user perspective, real-time data can mean anything – from faster access to data, more frequent data refresh, and detecting events as soon as they happen. From a...