You're reading from Building Big Data Pipelines with Apache Beam Use a single programming model for both batch and stream data processing

Product type Paperback

Published in Jan 2022

Publisher Packt

ISBN-13 9781800564930

Length 342 pages

Edition 1st Edition

Languages

Python

Tools

Apache Beam

Concepts

Big Data

Author (1):

Jan Lukavský

View More author details

Table of Contents (13) Chapters

Preface

1. Section 1 Apache Beam: Essentials

2. Chapter 1: Introduction to Data Processing with Apache Beam FREE CHAPTER

3. Chapter 2: Implementing, Testing, and Deploying Basic Pipelines

4. Chapter 3: Implementing Pipelines Using Stateful Processing

5. Section 2 Apache Beam: Toward Improving Usability

6. Chapter 4: Structuring Code for Reusability

7. Chapter 5: Using SQL for Pipeline Implementation

8. Chapter 6: Using Your Preferred Language with Portability

9. Section 3 Apache Beam: Advanced Concepts

10. Chapter 7: Extending Apache Beam's I/O Connectors

11. Chapter 8: Understanding How Runners Execute Pipelines

12. Other Books You May Enjoy

Summary

In this chapter, we went over some of the basic theoretical concepts you will need to understand in order to keep up with the following chapters. These include the difference between processing time and event time, which is the key knowledge for being able to define the correctness of streaming computation. Processing time is mostly useful for defining the rate of the (partial) result emission via triggers, because otherwise you would always have to wait for the end of the window to get a result. We have also seen how different accumulation modes affect the output of a computation.

We have walked through the life cycle of states, as needed for aggregations. We have seen that watermarks are a systematic approach for the definition of the position in the event time and, as such, define the relationship between the event time and the processing time. We also walked through how to write your first pipeline using Beam. We'll be using these lessons as a foundation for everything we cover throughout this book.

In Chapter 2, Implementing, Testing, and Deploying Basic Pipelines, we'll be developing our understanding of pipelines even further, covering the implementation, testing, and deployment of pipelines to real distributed runners.

You're reading from Building Big Data Pipelines with Apache Beam Use a single programming model for both batch and stream data processing

Table of Contents (13) Chapters

Summary

Authors (1)

Personalised recommendations for you