Introducing indirection

What happens to connections to a PostgreSQL system when the service must be shut down for maintenance, or the node itself experiences a hardware problem? Previous recipes have already recommended we integrate at least one data replica into our design, but how should we handle switching between these resources? A great way to achieve high availability is to make server maintenance or replacement as simple as possible.

The concept we'll be exploring in this recipe will be one of anticipating system outages, and even welcoming them, by incorporating proxy techniques into the design.

Getting ready

There are actually several methods for switching from one PostgreSQL node to another. However, when considering the node architecture as a whole, we need to know the four major techniques to handle node indirection:

Domain name reassignment
Virtual IP address
Session multiplexing software
Software or hardware load balancer

In real terms, these are all basically the same thing: a proxy for our PostgreSQL primary node. Keep this in mind as we consider how they may affect our architecture. It would also be a good idea to have some diagram software ready to describe how communication flows through the cluster.

How to do it...

Integrating a proxy into a PostgreSQL cluster is generally simple if we consider these steps in the design phase:

Assign a proxy to the primary node.
Redirect all communication to the primary node through the proxy.
If the proxy requires dedicated hardware or software, designate two to account for failures.

How it works...

These rules are simple, but that's one of the reasons they're often overlooked. Always communicate with the Primary node through at least one proxy.

Even if this is merely an abstract network name, or an ephemeral IP address, doing so prevents problems that could occur, as seen in the following diagram:

What happens when the Primary PostgreSQL node is offline and the cluster is now being managed by the Standby? We have to reconfigure—and possibly restart—any and all applications that connect directly to it. With one simple change, we can avoid that concern, as seen here:

By following the second guideline, all traffic is directed through the Proxy, thus ensuring that either the Primary or Standby will stay online and remain accessible without further invasive changes. Now, we can switch the active primary node, perform maintenance, or even replace nodes entirely, and the application stack will only see the proxy.

We've encountered clusters that do not follow these two guidelines. Sometimes, applications will actually communicate directly with the primary node as assigned by the inventory reference number. This means any time the infrastructure team or vendor needs to reassign or rename nodes, the application becomes unusable for a short period of time.

Sometimes, hardware load balancers are utilized to redirect application traffic to PostgreSQL. On other occasions, this is done with connection multiplexing software such as PgBouncer or HAProxy. In these cases the proxy is not simply a permanent network name or IP address that is associated with the PostgreSQL cluster, but a piece of hardware. This means that a software or hardware failure could also affect the proxy itself.

In this case, we recommend using two proxies, as seen here:

This is especially useful in microarchitectures, which may consist of dozens or even hundreds of different application servers. Each may target a different proxy such that a failure of either only affects the application servers assigned to it.

There's more...

Given that applications must always access PostgreSQL exclusively through the Proxy, we always recommend assigning a reference hostname that is as permanent as possible. This may fit with the company naming scheme, and should always be documented. PostgreSQL nodes may come and go and, in extreme cases, the cluster itself can be swapped for a replacement, but the Proxy is (or should be) forever.

Physical proxy nodes themselves are not immune to maintenance or failure. Thus, it may be necessary to contact the network team to assign a CNAME or other fixture that can remain static even as the proxy hardware fluctuates.