Observability
In previous chapters, we saw how it is possible to break down an existing application along bounded context boundaries. We also saw how it is possible to split bounded contexts to be extremely fine-grained, often as physically disparate components. Failure in any of these components can cause disruptions in others that are dependent on them. Obviously, early detection and more importantly attribution to specific components through a combination of proactive and reactive monitoring can ideally prevent or, at the very least, minimize business disruption.
When it comes to monitoring, most teams seem to think of technology runtime metrics that we associate with components (such as CPU utilization, memory consumed, queue depths, exception count, and so on).
Lending Objectivity to Metrics
To make it more formal, we use the terms Service-Level Objectives (SLOs) and Service-Level Indicators (SLIs) specified within a Service-Level Agreement (SLA) to mean the following...