Collecting resource metrics
So far, we have seen that serverless and autonomous service patterns provide us with a lot of fine-grained metrics without the need for explicit instrumentation. Now we need to turn all this data into actionable information so that we can fail forward fast.
Back in 2015, when I first found myself in the role of operation manager, I went looking for guidance so that I could put together a mental model for reasoning about all this data. One of the best sources I found was a series of posts by Alexis Lê-Quôc, Datadog’s CTO and co-founder, called Monitoring 101 (https://www.datadoghq.com/blog/monitoring-101-collecting-data).
In this series, he recommends grouping observability data into three main categories: work metrics, resource metrics, and system events. In this section, we will dig into resource metrics and cover the USE method and how it applies to the autonomous service patterns and serverless. Specifically, we will look at...