In every production system, disaster recovery (DR) and business continuity (BC) are the key concepts that you will have to bear in mind in order to provide the availability of your application workloads. You have to take them into account at an early stage in planning your cluster architecture. The proverb by failing to prepare, you are preparing to fail cannot be more true for operating distributed systems, such as Kubernetes. This chapter will focus on DR when running Kubernetes clusters. BC best practices, such as multiregion Deployments and asynchronous replication of persistent volumes, are not included in the scope of the chapter.
Disaster recovery, in general, consists of a set of policies, tools, and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster. You can read...