Metrics versus logs
As we saw in the previous chapter, logs are text messages produced as code is executed. They are good at giving visibility on each of the specific tasks that the system is performing, but they generate a huge amount of data, which is difficult to digest in bulk. Instead, only small groups of logs are able to be analyzed at any given time.
Normally, the logs analyzed will all be related to a single task. We saw in the previous chapter how to use a request ID for that. But on certain occasions, it may be necessary to check all logs happening in a particular time window to see crossing effects, like a problem in one server that affects all tasks during certain times.
But sometimes the important information is not a specific request, but to understand the behavior of the system as a whole. Is the load of the system growing compared to yesterday's? How many errors are we returning? Is the time it takes to process tasks increasing? Or decreasing...