Creating a commitment to reliability
Supporting production systems can be some of the most stressful work we do as software engineers. It may involve late nights, spoiled weekends, or interrupted family occasions when engineers must dig into incidents for hours on end with half-asleep brains, all while knowing the company is losing money by the second until systems are back online. It is crucial work that can be incredibly unpleasant and disruptive by its very nature. We may strive to make incident and support scenarios easier to manage and resolve, but in most settings, we can never shield our teams from them completely. In most engineering teams, production support is inevitable.
The goal of production support is to find a balance between the inherent stress of the work and the reality that systems must remain online and available. Our objective is to support these systems in such a way that we avoid burnout while delivering the best possible level of service.
Because this...