Introducing the reliability pillar
In general, the reliability pillar focuses on the following objectives:
- Ensuring you have a highly available architecture and avoid any single point of failure (SPOF)
- Business continuity and disaster planning in case of data loss, downtime, or catastrophic failures
- Testing the high availability and recovery of workloads
Building reliability in the cloud requires a shift in mindset, as we learned at the beginning of this book. For example, in traditional application development, the prime focus was on increasing the average time between system breakdowns. This metric is called the mean time between failures (MTBF). The work was primarily devoted to attempts to stop the system from failing. In the cloud, we have distributed systems and the approach will be slightly different, and we will require a shift in our mindset because of the following factors:
- The complexity of distributed systems is high and a single failure in...