Summary
In this chapter, we discussed in detail the key SRE technical practices: SLAs, SLOs, SLIs, error budgets, and eliminating toil. This included several critical concepts such as factors that can be used for a well-defined SLA, providing guidelines to set SLOs, categorizing user journeys, detailing sources to measure SLIs along with their limitations, elaborating on error budgets, detailing out factors that can make a service reliable, understanding toil's consequences, and elaborating on how automation is beneficial to eliminate toil. These concepts allow us to achieve SRE's core principle, which is to maintain the balance between innovation and system reliability and thus achieve the eventual goal: build reliable software faster.
In the next chapter, we will focus on concepts required to track SRE technical practices: monitoring, alerting, and time series. These concepts will include monitoring as a feedback loop, monitoring sources, monitoring strategies, monitoring...