Binding SRE and Cloud Operations
Chapter 2, SRE Technical Practices – Deep Dive, introduced SRE technical practices such as SLAs, SLOs, SLIs, and Error Budgets. To summarize, this chapter established a relationship between these practices and tied them directly to the reliability of the service. To ensure that a service meets its SLAs, the service needs to be reliable. SRE recommends using SLOs to measure the reliability of the service. SLOs require SLIs to evaluate the service's reliability. If these SLIs are not met, then the SLOs will miss their targets. This will eventually burn the Error Budget, which is a measure that calculates the acceptable level of unavailability or unreliability. Chapter 3, Understanding Monitoring and Alerting to Target Reliability, introduced concepts related to monitoring, alerting, logging, and tracing and established how these are critical to tracking the reliability of the service. However, both these chapters were conceptual in nature...