Here are some sample answers to the questions presented in this chapter:
-
We cannot do any live debugging on a production system for performance and security reasons. This includes interactive or remote debugging. Yet application services can show unexpected behavior to code defects or other infrastructure-related issues such as network glitches or external services that are not available. To quickly pinpoint the reason for the misbehavior or failure of a service, we need as much logging information as possible. This information should give us a clue about, and guide us to, the root cause of the error. When we instrument a service, we do exactly this — we produce as much information as reasonable in the form of log entries and published metrics.
- Prometheus is a service that is used to collect functional or non-functional metrics that are provided by other...