Introduction
One aspect of PostgreSQL administration, which is unfortunately ignored too frequently, is system monitoring. Provisioning, constructing, and maintaining a high availability cluster is difficult by itself, without the extra complications inherent in setting up yet more infrastructure.
Larger companies with an established Network Operations Center (NOC) probably have extremely mature incidence response and escalation procedures in place. Others may rely on a few basic monitors and alerts or ad hoc scripts set to trigger on certain thresholds. If we aren't part of the first group, we certainly can't include ourselves in the second and consider our cluster protected. When availability is important for business continuity, we should take the time to ensure that its activity is continuously reported, graphed, and summarized.
In this chapter, we will focus on what we should monitor, how often we should check system status, and how to present the data for easy consumption. When the database...