Achieving reliability through development processes and culture
In this section, we are going to describe some techniques for achieving higher service reliability based on changes in the development processes and culture. You will learn how to establish the processes for improving and reviewing your service reliability, how to learn from any service-related issues and incidents efficiently, and how to measure your service reliability. We will cover the processes and practices that are widely used across the industry, outlining the most important ideas from each one. The section is going to be more theoretical than the previous one; however, it should be equally useful.
First, we are going to provide an overview of the on-call process essential for setting up a mechanism for monitoring issues with your services.
On-call process
When your services start handling production traffic or start serving user requests, one of your first reliability goals should be to detect any issues...