Summary
In this chapter, we went over some of the basic theoretical concepts you will need to understand in order to keep up with the following chapters. These include the difference between processing time and event time, which is the key knowledge for being able to define the correctness of streaming computation. Processing time is mostly useful for defining the rate of the (partial) result emission via triggers, because otherwise you would always have to wait for the end of the window to get a result. We have also seen how different accumulation modes affect the output of a computation.
We have walked through the life cycle of states, as needed for aggregations. We have seen that watermarks are a systematic approach for the definition of the position in the event time and, as such, define the relationship between the event time and the processing time. We also walked through how to write your first pipeline using Beam. We'll be using these lessons as a foundation for everything we cover throughout this book.
In Chapter 2, Implementing, Testing, and Deploying Basic Pipelines, we'll be developing our understanding of pipelines even further, covering the implementation, testing, and deployment of pipelines to real distributed runners.