Controlling costs with sampling
Tracing all operations gives us the ability to debug individual issues in the system, even very rare ones, but it could be impractical from a performance and telemetry storage perspective.
The performance impact of optimized and succinct instrumentation is usually low, but telemetry ingestion, processing, storage, queries, and other observability experiences could be very costly.
Observability vendors’ pricing models vary – some charge for the volume of ingested traces, others for the number of events, traces, or hosts reporting data. The ingestion cost usually includes retaining telemetry for 1 to 3 months. Data retrieval and scanning are also billed by some vendors. Essentially, costs associated with sending and retrieving telemetry grow along with telemetry volume.
Realistically, we’re going to be interested in a very small fraction of traces – ones that record failures, long requests, and other rare cases. We...