When you eliminate the impossible, whatever remains, however improbable, must be the truth.
- Spock
So far, we explored how to gather metrics and how to create alerts that will notify us when there is an issue. We also learned how to query metrics and dig for information we might need when trying to find the cause of a problem. We'll expand on that and try to debug a simulated issue.
Saying that an application does not work correctly should not be enough by itself. We should be much more precise. Our goal is to be able to pinpoint not only which application is malfunctioning, but also which part of it is the culprit. We should be able to blame a specific function, a method, a request path, and so on. The more precise we are in detecting which part of an application is causing a problem, the faster we will find the cause...