Running a Highly Available Cloud – Meeting the SLA
One major aspect of successful cloud operating experiences is to prevent downtime and the failures of cloud resources and workloads. In Chapter 1, Revisiting OpenStack – Design Consideration, we drafted an initial preparatory design to enable OpenStack services for redundancy. Chapter 3, OpenStack Control Plane – Shared Services, and Chapter 4, OpenStack Compute – Compute Capacity and Flavors, looked at some of the logical design patterns for OpenStack control plane deployments and various ways to segregate compute, such as cells and availability zones. OpenStack is designed to scale massively and providing hardware for dedicated OpenStack services can help isolate failures, but this requires mechanisms to keep services running during incidents.
Ensuring high availability (HA) in the...