Defining and leveraging SLOs and SLIs
SLOs and SLIs define how we measure health (the SLI) and what is considered healthy (the SLO). A very popular SLI is uptime, a measure of the amount of time a service or system is able to do its job as requested. I often say that the SLI is like a person’s temperature, and the SLO is the range of temperatures defining healthy – for example, 106 degrees is my SLI, and knowing that anything above 104 is an emergency is the SLO.
Now, we often see an SLO identified as where we want to be, the healthy zone we strive for – but not defining levels of health as part of the SLO just makes sense. After all, there is a very different response required for a 99.5-degree temperature and a 106-degree temperature, because the levels of health are often not singular.
I refer to SLI as the measure of health because it defines a specific value that can be directly tied to health. SLIs should be very specific and, most importantly, measurable...