Debugging pipelines and using Apache Beam metrics for observability
Observability is a key part of spotting potential issues with a running pipeline. It can be used to measure various performance characteristics, including the number of elements processed, the number of RPC calls to backend services, and the distribution of the event-time lags of elements flowing through the pipeline.
Although it should be possible to create a side output for each metric and handle the resulting stream like any data in the pipeline, the requirement for quick and simple feedback from running pipelines led Beam to create a simple API dedicated to metrics. Currently, Beam supports the following metrics:
- Counters
- Gauges
- Distributions
A Counter
instance is a metric that is represented by a single long value that can only be incremented or decremented (this can be by 1, or by another number).
A Gauge
instance is a metric that also holds a single long value; however, this value...