Understanding high availability
I’m not going to state Murphy’s Law a third time, but understand that it applies here as well. Things will go wrong and they will fall apart. Never forget that. One of the reasons DevOps as a concept and culture became so popular was that its techniques delivered a highly available product with very little downtime, maintenance time, and vulnerability to app-breaking errors.
One of the reasons DevOps succeeds in its mission for high availability is the ability to understand failure, react to failure, and recover from failure. Here’s a famous quote from Werner Vogel, the CTO of Amazon:
This is, in fact, the foundation of the best practice guides, tutorials, and documentation that AWS makes for DevOps operations, and it’s true. Sometimes, things fail because of a mistake that has been made. Sometimes, they fail because of circumstances that are completely out of our control, and sometimes...