Leveraging multi-master

One significant benefit to using software that enables multiple primary nodes in a PostgreSQL cluster is the associated increase in availability. This functionality can eliminate node promotion time and allow a fully active application stack on all data backends if properly configured.

In this recipe, we'll explore advanced usage of a multi-master cluster, and how it can help us reach the pinnacle of high availability.

Getting ready

It's crucially important to become familiar with the benefits and drawbacks of how multi-master operation can affect the cluster. The previous Incorporating multi-master recipe is a good place to start. Additionally, information we cover here can be directly relevant to the Defining timetables through RTO recipe and the Picking redundant copies recipe.

In a way, this recipe will bring together a lot of concepts we've covered through the chapter, so we recommend covering it last if possible.

How to do it...

To really make the most of multi-master architecture, follow these guidelines:

Always allocate a proxy layer.
If cross-data center latency is relevant, allocate at least two nodes per location.
It's no longer necessary to worry about adding nodes specifically to maintain quorum.
Geographically partition data if possible.

How it works...

We actually recommend applying the first rule to all clusters, as suggested in the Introducing indirection recipe. It's especially important here as the focus is specifically centered on maximizing availability.

Unlike a standard PostgreSQL node, a cluster containing multiple primary nodes does not require the promotion of alternate systems to writable status. This means we can switch to them in a nearly instantaneous manner. A properly configured proxy layer means this is possible without directly alerting the application layer. Such a cluster could resemble this diagram:

Given this configuration, it's possible to switch from one Primary to the other with a pause of mere milliseconds in between. This effectively means zero RTO contribution for that action. This allows us to perform maintenance on any node, essentially without disturbing the application layer at all.

In the preceding configuration, however, we only have one node per location. In the event that the Primary in Chicago fails or is undergoing maintenance, applications in that location will be interacting with the Dallas node. A better design would be something like this:

With two nodes per data center, we're free to swap between them as necessary. If the proxy uses a connection check mechanism, it can even autodetect online status and ensure traffic always goes to the online node in the same location.

The extra Primary per data center need not remain idle when not in use. Some proxy systems can allocate application sessions by user, account, or some other identifying characteristic. This allows safe load balancing that avoids risks associated with multi-master systems, such as conflict resolution.

Pay attention to the preceding diagrams and try to find one common attribute they both share.

Find it yet?

Each cluster has an even number of nodes. Also note that we didn't compensate for this by adding any kind of witness node to help arbitrate the quorum state. This is because each node is a Primary with no failover process to manage. As a consequence, we no longer have the usual cause of split brain, nor must we worry too much about network partition events.

Finally, try, if possible, to arrange the cluster such that data is as closely associated with its users as possible. If users are bank clients interacting with their own account and can be regionalized by country, this is an easy choice. If it's a shared service microarchitecture with applications indiscriminately modifying data from arbitrary accounts, that's not so simple.

For those more advanced circumstances, it's possible to approach the problem from a smaller scale. Perhaps servers in the same rack only communicate with the database nearest to them physically. Perhaps the proxy layer can use sticky sessions and route connections to specific primary nodes based on a stable metric.

The goal here is data locality. While multi-master PostgreSQL allows multiple nodes to ingest writes simultaneously, consider transmission latency. We can observe this in a simple two-node interaction:

Node A accepts a write for Account X.
Node A sends the result to Node B.
The application is stateless and connects to Node B.
The application notices data is missing in node B and submits a change again.
Node B replays data from Node A.
Account X has now been modified twice.

If the application session was tightly coupled to one primary node, this scenario would not be possible. There are numerous ways to accomplish this coupling, and it helps ensure fastest turnaround for associated data that was previously modified in any case.

There's more...

PostgreSQL multi-master solutions use logical replication to transfer data between nodes by necessity. As a result, software versions need not match. This means that PostgreSQL 11 and PostgreSQL 12 nodes may coexist in the same cluster. Combined with a proxy layer as recommended, this allows fully online, major-version upgrades. From an RTO perspective, this means the following elements may all be assumed to contribute zero or a small number of milliseconds:

Node failover and switchover
Minor upgrades (v12.1 to v12.2)
Node maintenance
Major upgrades (v11 to v12)

Due to its proprietary nature, PostgreSQL multi-master is generally not available without additional cost. Consider any associated pricing when tabulating RTO architecture variant cost sheets. This should enable management to make an informed decision based on expenses associated with pursuing extremely low RTO features such as these.