Collecting service metrics
In this section, we are going to describe another type of service telemetry data: metrics. To understand what metrics are and how they are different from log data, let’s start with an example. Imagine that you have a set of services providing APIs to their users, and you want to know how many times per second each API endpoint is called. How would you do this?
One possible way of solving this problem is using logs. We could create a log event for each request, and then we would be able to count the number of events for each endpoint, aggregating them by second, minute, or in any other possible way. Such a solution would work until we get too many requests per endpoint and can’t log each one independently anymore. Let’s assume there is a service that processes more than a million requests per second. If we used logs to measure its performance, we would need to produce more than a million log events every second, generating lots of data...