Reliability Design Principles
Reliability refers to the ability of a system to function repeatedly and consistently as expected. As you can imagine from that definition, it can mean totally different things depending on the system at hand. Ensuring the reliability of a nightly batch application running on weekdays will be something very different from ensuring the reliability of an application serving requests 24/7.
The reliability pillar of the AWS Well-Architected Framework comprises five design principles to keep in mind when designing a workload for reliability in the cloud.
Principle 1 – Automatically Recover from Failure
“Everything will eventually fail over time,” said Werner Vogels, the CTO of Amazon. You can’t expect to have humans constantly watching the vital signals, also known as key performance indicators (KPIs), of each workload you deploy in the cloud and taking action whenever something goes wrong. Although you may need to rely...