You're reading from PostgreSQL 12 High Availability Cookbook Over 100 recipes to design a highly available server with the advanced features of PostgreSQL 12

Product type Paperback

Published in Feb 2020

Publisher Packt

ISBN-13 9781838984854

Length 734 pages

Edition 3rd Edition

Languages

SQL

Tools

PostgreSQL

Concepts

Databases

Author (1):

Shaun Thomas

View More author details

Considering quorum

Quorum can best be explained by imagining any voting system. It's a result of trusted consensus and relies on multiple implementations backed by dissertation and quantitative study. The most common way to guarantee a quorum for a PostgreSQL cluster is by utilizing a witness node. This exists only to vote and observe the state of the cluster. This helps us reach maximum availability by guaranteeing there's always an active primary node.

In this recipe, we'll examine why it's important to apply the concept of quorum to our PostgreSQL cluster, and how we may do so.

Getting ready

The primary criteria for establishing a quorum is that we must satisfy the capability for avoiding tie votes, also known as establishing consensus. Basically, this means we must have an odd number of PostgreSQL nodes within our cluster such that there's always a majority. We should already have a preliminary node count by working through previous recipes in this chapter, in particular, the Picking redundant copies recipe and the Selecting locations recipe.

That being said, the concept of quorum is only necessary in clusters that intend to provide automated failover capabilities. If this is not going to be a feature of the end architecture, this recipe may be skipped.

How to do it...

Once we have an initial node count, we should apply these guidelines to adjust the total count and node distribution:

If the initial PostgreSQL node count is even, add one witness node.
If the initial PostgreSQL node count is odd, convert one replica into a witness node.
In the presence of two locations, the witness node should reside in the same data center as the primary node.
If possible, allocate witness nodes in an independent tertiary location.

How it works...

While deceptively simple, there's actually a lot of thought involved in correctly placing an odd node, and why we use witness nodes rather than yet another PostgreSQL replica:

Our first guideline is the most straightforward of these, such that we ensure there are an odd number of nodes in the cluster. Once we have that, any event in the cluster is submitted to the entire quorum for a decision, and only agreement guarantees subsequent action. Further, since the witness cannot vote for itself, only one eligible node will ever win the election. Consider this sample cluster diagram:

We have three nodes in this cluster and, in the event of a failure of the Primary node, the Witness must vote for the only remaining Replica. If the Witness had been a standard replica node, it could have voted for itself and potentially led to a tied vote. In an automated scenario, this would prevent the cluster from promoting a replacement Primary node.

The second guideline is a variant of this concept. If we already had an odd number of nodes, one of these should be a Witness rather than a standard replica. Consider this diagram:

We can see here that the third node is still a replica, but it also acts as a Witness. Essentially, we don't allow this node to vote for itself to become the new Primary. This kind of role works well for read-only replicas that exist only for application use and is a good way to reuse existing resources.

The third guideline, of placing the Witness in the same location as the Primary node, safeguards node visibility. More important than automation is safety. By placing the Witness in the same location as the Primary when there are only two data centers, we can ensure that a network partition—a situation where we lose network connectivity between the data centers—won't result in the alternate location incorrectly promoting one of its replicas. Consider this diagram:

If the connection between Chicago and Dallas is lost, Chicago still has the majority of voting nodes, and Dallas does not. As a result, the cluster will continue operating normally until the network is repaired, and we didn't experience an accidental activation of a node in Dallas.

Some failover automation systems also take physical location into account by verifying that all nodes in one location agree that all nodes in the other location are not responding. In these cases, the only time where automation will not work normally is when a network partition has occurred. This approach is only viable when more than one node exists in each location. Such can be accomplished by allocating further replicas, or even witness nodes.

Unfortunately, our cluster is no longer symmetrical. If we activate the node in Dallas, there are no witnesses in that location, so we must eventually move the Primary back to Chicago. This means every failover will be followed by a manual switch to the other location, thus doubling our downtime.

The easiest way to permanently address these concerns is to add a third location and assign a node there. In most cases, this will be the Witness node itself. Consider this example:

In this case, we may desire that only Chicago or San Jose host the active PostgreSQL node. In the event of a failure of our Primary node, San Jose should take over instead. The Witness can see both data centers and decide voting based on this. Furthermore, it doesn't matter if the Primary is active in Chicago or San Jose, because the Witness is not tied directly to either location.

There's more...

What happens in the case of a tie? Even if the original cluster contained an odd number of nodes, when the Primary node goes offline, this is no longer true. In simple quorum systems, each node votes for itself. However, a Witness, by its definition, must vote for some other node. This means some replica in the cluster will have more than one vote, and thus win the election.

In case there are somehow multiple witnesses, and votes are split anyway, PostgreSQL quorum systems usually account for the Log Sequence Number (LSN) from the Primary node. Even if it's only a single transaction, one of the nodes with the most votes will have replicated more data than the other, and this will break the tie.