Using diagnostics tools in production
In production, we need to be able to collect some data proactively with reasonable performance and a telemetry budget so that we can analyze data afterward.
It’s difficult to reproduce an issue on a specific instance of a running process and collect performance traces or dumps from it in a secure and distributed application. If an issue such as a slow memory leak or a rare deadlock affects just a few instances, it might be difficult to even detect it and, when detected, the instance has already been recycled and the issue is no longer visible.
Continuous profiling
What we’re looking for is a continuous profiler – a tool that collects sampled performance traces. It can run for short periods to minimize the performance impact of collection on each instance and send profiles to central storage, where they can be stored, correlated with distributed traces, queried, and viewed. Distributed tracing supports sampling and...