Alerting
SLIs are quantitative measurements at a given point in time and SLOs use SLIs to reflect the reliability of the system. SLIs are captured or represented in the form of metrics. Monitoring systems monitor these metrics against a specific set of policies. These policies represent the target SLOs over a period and are referred to as alerting rules.
Alerting is the process of processing the alerting rules, which track the SLOs and notify or perform certain actions when the rules are violated. In other words, alerting allows the conversion of SLOs into actionable alerts on significant events. Alerts can then be sent to an external application or a ticketing system or a person.
Common scenarios for triggering alerts include (and are not limited to) the following:
- The service or system is down.
- SLOs or SLAs are not met.
- Immediate human intervention is required to change something.
As discussed previously, SLOs represent an achievable target, and error...