Summary
In this chapter, we discussed the concepts related to monitoring, alerting, and time series that are critical in tracking SRE technical practices, such as the SLO and error budgets. We also discussed the differences between black box monitoring and white box monitoring. In addition, we examined the four golden signals as recommended by Google to be the desired SLI metrics for a user-facing system.
In the next chapter, we will focus on the constructs required to build an SRE team and apply cultural practices such as handling facets of incident management, being on-call, avoiding psychological safety, promoting communication and collaboration, and knowledge sharing.