Stabilizing and operating the solution
Our goal is to ensure that our production environment remains stable, is resilient to handle the new changes, and that we continue to have sustainable value delivery. To maintain this activity, we want to apply the following practices:
- Site reliability engineering
- Failover and disaster recovery
- Continuous Security Monitoring
- Architecting for operations
- Monitoring NFRs
We have previously looked at testing and monitoring NFRs in Chapter 12, Continuous Deployment to Production. Let’s examine the remaining practices.
Site reliability engineering
We first learned about Site Reliability Engineering (SRE) in Chapter 6, Recovering from Production Failures. In that chapter, we saw the following four practices that site reliability engineers use to maintain the production environment when high availability is required for large scaled systems:
- Formulation of an error budget using Service Level Indicators...