We know that the PostgreSQL database will be offline at some point in the future. Maybe we need an upgrade to remove a critical security vulnerability or address a potential data corruption issue. Perhaps a RAM module is producing errors and requires immediate replacement. Maybe the primary data center was struck by lightning.
No matter the reason, we need to make decisions quickly. A helpful way to ensure adaptability is to base our decision-making process on user expectations for various levels of liability and context. The QA department will not require the same response level as 10,000 shoppers who can't make a holiday purchase during a critically limited sale.
System outage and response escalation expectations are generally codified in a service-level agreement (SLA). How long should the maintenance last? How often should planned outages...