Warm and hot standby restore (Intermediate)
Another option for restoring a PostgreSQL backup is to restore indefinitely. A backup restored this way would always be a working copy of the database, available with a single activation command (warm
), or online for read-only queries (hot
).
Getting ready
Please refer to the Getting a basic export (Simple) recipe to bootstrap our database. Before we start the server to create the sample
database, we need to change a few settings in postgresql.conf
as shown in the following code:
wal_level = hot_standby archive_mode = on archive_command = 'rsync -a %p postgres@backup::archive/%f'
Next, we will want to set up the backup
server with an rsyncd.conf
file:
[db_sync] path = /db/pgdata comment = DB Backup Server uid = postgres gid = postgres read only = false use chroot = true [archive] path = /db/archive comment = Archived Transaction Logs uid = postgres gid = postgres read only = false use chroot = true
Make sure /db/pgdata
and /db/archive
exist on both servers and the rsync
daemon has been restarted. Then start the main database and use pgbench
to initialize our sample data.
How to do it...
The rsync
backup step that we will perform also acts as a restore. The data on the backup server will be almost ready to run. Execute the following commands from the main database server:
Start by entering
backup
mode:$> psql -c "SELECT pg_start_backup('standby');" postgres
Next, send the
data
files to thebackup
server:$> rsync -a --exclude=pg_xlog /db/pgdata/ \ postgres@backup::db_sync
Then close
backup
mode:$> psql -c "SELECT pg_stop_backup();" postgres
Once the data is copied, go to the backup
server and finish setting up the database. Remember, this is on the backup server, not the primary server. The following are the steps to finish the setup:
Create a
pg_xlog
directory so the database will start:$> mkdir /db/pgdata/pg_xlog
Remove the
backup
label, since it is not necessary for recovery:$> rm /db/pgdata/backup_label
The
postmaster
files are created during operation for bookkeeping, and can cause problems when restarting if certain flags are omitted. Remove them:$> rm /db/pgdata/postmaster.*
Create a
recovery.conf
file on thebackup
server with the following contents:standby_mode = on trigger_file = '/tmp/promote_db' restore_command = 'cp -f /db/archive/%f "%p"'
Then start the backup copy of the database:
$> pg_ctl -D /db/pgdata start
How it works...
We are starting to combine many of the techniques we have learned so far. Primary among those is the ability to copy data from one server to another without the intermediate step of first transforming it to a single file.
We start the process by setting up the main server to produce transaction logs in hot_standby
mode. Doing so is necessary if we want to utilize the database in read-only mode on the backup server. We also modify the archive
command to transfer archived transaction logs directly to the backup
server. This is just one method of doing so, though a fairly convenient one.
Having both of the rsync
target paths set up in advance promotes these types of transfers. Data synchronization is necessary for starting the backup from scratch any time, and we need the archive log target so that the backup can stay in recovery mode.
The backup and restore process itself is something we covered before. Here, we simply copy all of the data from the main database except for the transaction logs. Those will be copied by the database using the recovery.conf
file that controls the restore process. That recovery should begin as soon as the backup copy of the database is started.
With this setup, if we should try to connect to the backup database, it will respond with:
psql: FATAL: the database system is starting up
That's okay for now, because we know the database is properly in recovery mode and keeping itself as a fresh backup of our main copy. This status is what's known as warm standby, and for a very long time, was the only way PostgreSQL could do standby operations.
There's more...
Simply having a standby database is a very powerful backup technique. But how do we actually utilize the backup copy?
Trigger a backup online
The easiest method for activating the backup
database so it acts as the main database would, is to use a trigger
file. This can actually be activated by any user who can access the backup server. In our recovery.conf
, we defined a trigger
file:
trigger_file = '/tmp/promote_db'
If this trigger
file exists, PostgreSQL will delete it, stop processing transaction logs, and restart the database as a normal operating server. On secure dedicated backup servers, this is a valid and easy way for activating a backup
server and making it a normal online database.
Backup database promotion
The second method utilizes another built-in capability of the venerable pg_ctl
tool. To restart without a trigger
file, we do something called a database promotion using the following command:
$> pg_ctl -D /db/pgdata promote
This has the same effect as the trigger
file, but can only be used by the user who started or owns the PostgreSQL service. On less secure servers, it might be a good idea to use the previous command instead of the trigger_file
line in the recovery.conf
file.
Hot standby
Believe it or not, a standby
server does not simply have to process transaction logs and wait to be promoted or triggered. In fact, it is now very common to run one or more backup servers for various purposes related to load balancing.
To make a backup database readable, simply modify its postgresql.conf
file and enable hot_standby
:
hot_standby = on
So long as the main database also has the wal_level
set to hot_standby
, and had that setting before the backup was taken, backup databases can also service read-only queries.
Asynchronous archival
In some ways, using rsync
as the archive_command
line is a bad practice. If there are multiple backup or standby servers, where should the transaction logs be sent? There's also a small risk that the target becomes unresponsive and files build up waiting for archival until they consume all available drive space.
Another way to ensure that the archive logs make it to the backup
server is to utilize the Unix cron
system. Let's make a cron
entry on the main database server for the postgres
user by calling crontab -e
:
* * * * * rsync -a /db/archive/ postgres@backup::archive
This will check every minute, and ensure that the /db/archive
directories are the same on both servers. Of course, for this to work, we should revert to the old archive_command
in postgresql.conf
:
archive_command = 'test -f /db/archive/%f || cp %p /db/archive/%f'
With this in place, PostgreSQL will archive old transaction log files locally, and an asynchronous background process will ensure that these files reach the backup
server eventually.
Tip
You can set up the synchronization job on the backup
server instead, and reverse the direction of the rsync
, so the backup
server pulls the files from the primary database. By doing this, you can set up multiple backup servers for different purposes.
It is a good idea to also set up a cleanup process to prune very old transaction log files. Having an asynchronous transmission method is safer, but a little more difficult to maintain. We can use cron
similarly to the rsync
to do this:
0 * * * * find /db/archive -type f -mtime +7 -delete
This would delete any transaction log files older than one week, every hour.