Detecting failures
For a system to be highly available, failures need to be detected and corrective action needs to be taken. For systems requiring lesser availability, it is acceptable to page the database administrator when anomalies occur and have them wake up and figure out what happened. To provide High Availability, you will need a rotation of database administrators sitting on guard in case something bad happens, in order to have even a remote chance of reacting fast enough. Obviously, automation is required here.
At first glance, it might seem that automating failovers is easy enough—just have the standby server run a script that checks whether the primary is still there, and if not, promote itself to primary. This will actually work okay in most common scenarios. However, to get High Availability, thinking about common scenarios is not enough. You have to make the system behave correctly in the most unusual situations you can think of, and even more importantly also in situations...