Chapter 5 – Configuration and Control Plane
- We’d need tail-based sampling that’s applied after span or trace ends and we know the duration or if there were any failures. Tail-based sampling can’t be done inside the process since we have distributed multi-instance applications, but we can use a tail-based sampling processor in the OpenTelemetry Collector that buffers traces and then samples them based on latency, or status codes.
If we only capture suspicious traces, we will not have a baseline anymore – we won’t be able to use traces to observe normal system behavior, build analytics, and so on. So, we should additionally capture a percentage or rate of random traces – if we mark them somehow, we can analyze them separately from problematic traces to create unbiased analytics.
It’s always a good idea to rate-limit all traces, so we don’t overload the telemetry pipeline with traffic bursts.
In addition...