Search icon CANCEL
Subscription
0
Cart icon
Cart
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
PostgreSQL 12 High Availability Cookbook - Third Edition

You're reading from  PostgreSQL 12 High Availability Cookbook - Third Edition

Product type Book
Published in Feb 2020
Publisher Packt
ISBN-13 9781838984854
Pages 734 pages
Edition 3rd Edition
Languages
Concepts
Author (1):
Shaun Thomas Shaun Thomas
Profile icon Shaun Thomas
Toc

Table of Contents (17) Chapters close

Preface 1. Architectural Considerations 2. Hardware Planning 3. Minimizing Downtime 4. Proxy and Pooling Resources 5. Troubleshooting 6. Monitoring 7. PostgreSQL Replication 8. Backup Management 9. High Availability with repmgr 10. High Availability with Patroni 11. Low-Level Server Mirroring 12. High Availability via Pacemaker 13. High Availability with Multi-Master Replication 14. Data Distribution 15. Zero-downtime Upgrades 16. Other Books You May Enjoy

Preventing split brain

Split brain is the scenario that occurs when more than one primary node is active in a PostgreSQL cluster simultaneously. In these circumstances, if any data was written to both nodes from the application, it becomes extremely difficult to rectify. Certainly, no cluster with such data corruption can be considered highly available!

In this recipe, we will further explore the concept, and how we might mitigate this problem.

Getting ready

An important concept necessary for preventing split brain scenarios is fencing, or isolation of a node from the application and database stack. Often, this is accomplished through STONITH (which stands for Shoot The Other Node In The Head). After accounting for situations where this is not possible, the old primary must invoke SMITH instead, or Shoot Myself In The Head. While it may sound extreme, for servers, this is really a temporary solution to prevent a more worrying complication.

Keep these terms in mind while we explore how they may affect our architecture.

How to do it...

Apply these steps when designing a cluster to help minimize or eliminate the risk of split brain:

  1. If available, allocate STONITH hardware for nodes that may take the role of the Primary.
  2. Consider situations where SMITH must be used instead, during network interruptions.
  3. Ensure PostgreSQL does not start automatically following a system reboot.

How it works...

Modern servers are often equipped with hardware that enables remote administration. These components often allow network access to the boot process itself. In the case where PostgreSQL is installed on virtual servers, the hypervisor serves this role. Many companies commonly install Power Distribution Units (PDU) that can be remotely instructed to cut power to a server.

Whatever the approach, working with infrastructure or systems operations teams is likely necessary to gain access to interact with these devices. As an example, imagine we have a 2-node cluster consisting of a Primary and Standby, and a PDU is available for each. We could do something like this:

In this case, in the case of a failover and subsequent promotion, the Standby could instruct the PDU to cut power to the Primary to ensure that it wasn't possible for applications to be connected. But what about a scenario where it's far more common for network interruptions, such as between two data centers, as seen here:

In this scenario, the Standby would be unable to shut down the Primary node in the case of a failover. This is why it's important to consider SMITH approaches as well. In this case, the Primary should monitor the state of the Standby and Witness, and if both disconnect for a sufficient period, it should assume the Standby will be promoted. In this case, it would power itself down to prevent applications from interacting with it.

Finally, always disable the PostgreSQL start up mechanism on clusters equipped with high availability management software. That software should manage starting and stopping the service, and this will also prevent unintended events such as accidentally having two primary nodes active simply because a reboot started a previously failed PostgreSQL node.

Advanced recipes later in this book will adhere to the rule of disabling PostgreSQL on startup and provide exact instructions for doing so.

There's more...

The reason preventing split brain is so difficult is that it's not an easy problem to solve. This is why dedicated software for managing high availability exists. While these are not immune to the issue, they greatly reduce the potential of occurrence.

Pacemaker has components specifically for interacting with STONITH hardware. repmgr implements the concept of hook scripts for event notifications, and accounts for scenarios where the Primary is isolated from the remainder of the cluster as seen in the previous diagram. Patroni uses a sophisticated locking mechanism that only allows one primary node to be registered at once.

Don't try to invent a solution for an already solved problem when most of the work has already been done by companies dedicated to the cause.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime