Chapter 4 – Low-Level Performance Analysis with Diagnostic Tools
- If your service defines SLIs, check them first and see whether they are within the boundaries defined by your SLOs. In other words, check the key metrics that measure your user experience and see whether they are within healthy limits. For REST API-based services, it is usually the throughput of successful requests and latency grouped by API and other things that are important in your application.
Resource consumption metrics could be correlated to user experience, but do not determine it. They (and other metrics that describe the internals of your service) can help you understand why the user experience has degraded and can predict future issues with some level of confidence.
- First, we should try to find which service is responsible: check upstream and downstream services for whether the load on your service is normal and properly distributed across instances. Check whether dependencies are...