Detecting a Disaster and Testing DR
The first step, before you can take any countermeasures, is to detect that a disaster is actually taking place. Your recovery objectives (RTO and RPO) will dictate how much time you actually have to do so. Consider a situation where you have an RTO of 4 hours with an RPO of 1 hour. This implies that you have up to 4 hours to recover in case of a disaster, but you cannot lose more than an hour’s worth of data. It also means that, whenever a disaster occurs, you must be able to detect the event rapidly enough to notify the stakeholders, escalate if needed, and trigger the DR response within 1 hour (to meet your RPO).
There are a number of things you can do to make sure to detect disasters on time.
Firstly, AWS offers a general service health dashboard that you can check to get the latest status information about AWS services in near real-time services. You can also subscribe to any of the associated RSS feeds to be notified when a specific...