Making use of new replication and backup add-ons
As you’ve worked your way through some of the new developer-related features, you’re now ready to address the new version’s powerful set of advanced features related to database administration. In this section, we’ll delve into the more complex world of database management, exploring topics that are new to PostgreSQL 17.
More powerful pg_dump, again
pg_dump
is the single most well-known tool to run a basic backup in PostgreSQL. It is a command-line utility that comes with PostgreSQL, used for backing up a PostgreSQL database or extracting its schema and data in a format suitable for loading into another PostgreSQL database. The main question is: after 38 years of development, what might have been added to this tool that is not already there? Well, the answer is that you can now define a file that configures what you want to dump and what you want to ignore. By adding the --filter
option, we can feed a file containing all our desired rules.
Handling incremental base backups
Talking about backups in general, pg_basebackup
has also been extended. PostgreSQL 17 supports the idea of incremental base backups. Why is that important? Often, we might want to use a simple backup policy such as “Take a base backup every night and keep it for 7 days.” The problem is that if your database is large (say, 50 TB) but static (virtually no changes), you will waste a lot of space just to store the backup, which can, of course, lead to serious cost considerations. Incremental base backup addresses this issue:
summarize_wal = on wal_summary_keep_time = '7d'
A new process called summarizer was added to PostgreSQL. It will keep track of all those blocks that have been changed and help pg_basebackup
to only copy those blocks that have indeed been touched, which reduces the amount of space needed for the backups to drop significantly.
Here is how it works:
pg_basebackup -h source_server.com \ -D /data/full --checkpoint=fast ... pg_basebackup -h source_server.com \ --checkpoint=fast \ --incremental=/data/full/backup_manifest \ -D /backup/incremental
The secret to success is the backup manifest that is needed to run the incremental backup. It contains all the necessary information to tell the tooling what has to be done.
After running those two commands, we have a full backup as well as an incremental one. The question now is: how can we combine those things together and turn them into something usable? The following command shows how this works:
$ pg_combinebackup --help pg_combinebackup reconstructs full backups from incrementals. Usage: pg_combinebackup [OPTION]... DIRECTORY... Options: -d, --debug generate lots of debugging output -n, --dry-run do not actually do anything -N, --no-sync do not wait for changes to be written safely to disk -o, --output output directory -T, --tablespace-mapping=OLDDIR=NEWDIR relocate tablespace in OLDDIR to NEWDIR --clone clone (reflink) instead of copying files --copy-file-range copy using copy_file_range() syscall --manifest-checksums=SHA{224,256,384,512}|CRC32C|NONE use algorithm for manifest checksums --no-manifest suppress generation of backup manifest --sync-method=METHOD set method for syncing files to disk -V, --version output version information, then exit -?, --help show this help, then exit
pg_combinebackup
does exactly what we want. It creates the desired set of files that are then needed for recovery. Given our example, we could use the following instruction to combine our full backup with our incremental backup:
pg_combinebackup -o /data/combined \ /data/full \ /backup/incremental
What is noteworthy here is that this process works for one base backup and exactly one incremental backup. However, in real life, we might have to apply a set of incremental backups to reach the desired state. In this case, we can simply list all those incremental ones one after the other, as shown in the next listing:
pg_combinebackup -o /data/combined \ /data/full \ /backup/incremental \ /backup/incremental2 \ /backup/incremental3
Simply list all the incremental backups to produce the desired state.
Logical replication upgraded
In PostgreSQL, there are two types of replication: physical (binary) and logical (text) replication. While binary replication is ideal for all kinds of backup, logical replication has become more and more widespread in heterogeneous environments to achieve cross-cloud portability.
The trouble is that publications and subscriptions (the backbone of logical replication) were lost during pg_upgrades
prior to PostgreSQL 17. This has now changed and has significantly eased the burden.
Adding pg_createsubscriber
In the new release, we can all enjoy a new command-line tool called pg_createsubscriber
. What is the purpose of this new tool? When people decide to use logical replication, the initial sync phase can take quite a while – especially when the database instance is large. pg_createsubscriber
has been designed to help solve this problem. It converts a physical standby (binary replication) and turns it into a logical standby by wiring all the publications, subscriptions, and so on for you. For each database, a replication set will be created and automatically configured. The command has to be executed on the target system.