Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
PostgreSQL 12 High Availability Cookbook

You're reading from   PostgreSQL 12 High Availability Cookbook Over 100 recipes to design a highly available server with the advanced features of PostgreSQL 12

Arrow left icon
Product type Paperback
Published in Feb 2020
Publisher Packt
ISBN-13 9781838984854
Length 734 pages
Edition 3rd Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Shaun Thomas Shaun Thomas
Author Profile Icon Shaun Thomas
Shaun Thomas
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Architectural Considerations 2. Hardware Planning FREE CHAPTER 3. Minimizing Downtime 4. Proxy and Pooling Resources 5. Troubleshooting 6. Monitoring 7. PostgreSQL Replication 8. Backup Management 9. High Availability with repmgr 10. High Availability with Patroni 11. Low-Level Server Mirroring 12. High Availability via Pacemaker 13. High Availability with Multi-Master Replication 14. Data Distribution 15. Zero-downtime Upgrades 16. Other Books You May Enjoy

Selecting locations

Once we've decided how many PostgreSQL nodes to allocate in our cluster, where should we put them? Generally, this is easy to answer, but there are some subtleties we need to consider as well. A truly high availability cluster can resist many different types of failure, including where the servers themselves reside.

In this recipe, we will learn all about the ways separate geographical locations can affect our chosen design.

Getting ready

It's time to start drawing diagrams. Find your favorite drawing program, such as Visio, Dia, or Draw.io, or a convenient sheet of paper or whiteboard.

Keep in mind that the ideal design may require more data centers than the company currently utilizes. In these cases, it may be possible to supply sufficient justification to contract at least one more location if it benefits the RPO or RTO. Hence, we recommend following the Setting expectations with RPO recipe and Defining timetables through RTO recipe before continuing here.

How to do it...

Consider these basic guidelines while thinking about how many data centers are necessary, and which nodes should be placed in each:

  1. If data must be available in case of a site outage, use one additional location.
  2. Always place the backup in a separate location if possible.
  3. If two locations are in the same general geographical area, use one additional location at least 100 miles (160 km) away.
  4. If automated failover is desirable, consider at least three data centers.
  5. Place one PostgreSQL server (or witness) in each location.
  6. Continue placing PostgreSQL servers evenly until the count is exhausted.
  7. Try to place witness servers in a location that is unlikely to lose contact with more than one location simultaneously.

How it works...

Let's consider an extreme example to explain how this works: a financial institution wants to place six PostgreSQL nodes, one witness server, and a backup system. This would clearly be a silly design, as shown in the following diagram:

This places every node in a single location that could lose its connection to the internet, lose power, or suffer some other catastrophe that would render the entire database stack unusable or even destroyed.

Now, let's apply the guidelines. First of all, we want to protect the backup; let's place that elsewhere, as seen in this following diagram:

Now, one PostgreSQL server and the backup are safe in case something happens to the first data center. Now of course, we have a new problem: what happens if Chicago itself is somehow isolated from the rest of the internet. Though incredibly rare, major internet backbone outages like this are possible.

So, let's add a third data center in Dallas. This allows us to actually follow three separate rules. We can move the backup to that remote location so it's even safer. We can relocate at least one more PostgreSQL server to that data center as well, so it acts as an alternate in case Chicago becomes unavailable. And finally, we have three data centers, so it's possible to safely use automated failover.

Recipes later in this chapter will explain why we keep insisting that safety comes in odd numbers. For now, just keep it in mind when considering the design.

With these revisions, our cluster looks more like the following diagram:

The only remaining rules suggest that our node distributions should be more even, and that the witness should be less likely to lose contact with more than one location at once. This means we need to move a couple more of our nodes to the other data centers. But what about the witness node? It's actually best to leave it in the second Chicago location. If Chicago is separated from Dallas, the witness is still at least in another data center, and is less likely to lose contact with Chicago, thereby preserving its voting abilities.

Given we're discussing a large financial institution that likely has access to multiple data centers, perhaps there's an even better solution. Following one final node reorganization, the cluster probably looks more like this diagram:

Now, we can utilize three data centers that are relatively diverse in their geographic isolation. This may appear to be an extreme case, but is merely illustrative of the process we recommend to ensure that the overall state of the cluster is as safe as possible.

There's more...

As database sizes increase, sometimes having only a single offsite backup as the only recovery source can be somewhat inconvenient. Not only do we have to wait to copy data from the backup location, but potentially any related time to restore the database instance to working order, and any further recovery steps.

In these cases, it's reasonable and even suggested to maintain a backup server at each major data center. This can be done by backing up a local replica, or by using some other kind of filesystem copy or distribution mechanism synchronizing between the locations.

See also

Please refer to this short list of companies that offer virtual hosting to use as quick supplementary data center locations:

Additionally, diagram software is various and widely available. Here are some of our favorites:

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime