Summary
That brings us to the end of this chapter. This is one of the most important chapters, both from a syllabus perspective and a data engineering perspective. Batch and streaming solutions are fundamental to building an effective Big Data processing system.
To summarize what you learned in this chapter, you started with designs for streaming systems using Event Hubs, ASA, and Spark Streaming and moved on to grasp time series data and the important concepts such as windowed aggregates, checkpointing, replaying archived data, handling schema drifts, scaling using partitions, and adding processing units. You then followed the exam Study Guide into a detour dedicated to the distinction between analytical processes and transactional processes and how you can optimize pipelines for each type of access, using Cosmos DB as a notable example. Finally, you returned to the topic of stream data and explored the upsert feature, and towards the end, learned about error handling and interruption...