Synchronizing backup servers (Intermediate)
Binary backups with tar
create an intermediate file for archival purposes. Instead, it might be better to back up directly to another server to have a full running backup. To do this, most administrators rely on rsync
, which remotely synchronizes data between two servers. For low volume servers, this can be much faster than tar
as well.
Getting ready
Please refer to the Getting a basic export (Simple) recipe on preparing a sample
database.
Our server will be running in the /db
directory on both systems. Make sure the path of the directory exists and is owned by the postgres
user. On the remote server that will be receiving the backup, create a file named /etc/rsyncd.conf
with the following content:
[db_sync] path = /db comment = DB Backup Server uid = postgres gid = postgres read only = false use chroot = true
There are other available options for securing the remote server, but for now, we will ignore those options for the purposes of demonstration.
How to do it...
Creating an rsync
backup comes in three basic steps:
Begin by putting the database in
backup
mode:$> psql -c "SELECT pg_start_backup('label');" postgres
Next, use the
rsync
command to synchronize the contents of the two servers:$> rsync -a -v -z /db/ postgres@remote_server::db_sync
Finally, end
backup
mode:$> psql -c "SELECT pg_stop_backup();" postgres
How it works...
In our initial steps, we configure the rsync
daemon with a module to accept file transfers. It's configured to run as the postgres
user by default since that is a very common setup. We also enable symbolic links because PostgreSQL uses them extensively in tablespaces. We do not want to diverge too far from our copied server, so all paths should remain the same if possible.
As with a tar
backup, we tell PostgreSQL that we're starting a backup. This is not strictly necessary since we're making an exact copy of the database as it runs, but is a good practice.
Next, we use rsync
to physically copy the data to the remote server. We add the -a
flag to copy all data attributes such as file ownership and permissions. The -v
flag is simply to increase verbosity so we can watch the copy progress. Again, such a flag is not necessary but useful for illustrative purposes.
Tip
To shorten the duration of the backup and lower the amount of transaction log files, you will need to track and execute the same command before starting backup
mode. The rsync
command will then only copy changed files since the first synchronization.
Finally, we stop the backup. If we never entered PostgreSQL's backup
mode, we don't need to end it either. In reality, the entire backup can be done with the rsync
command alone.
Tip
The rsync
command does not need to work on a push model. Instead, the rsyncd.conf
file could reside on the server itself, and backup servers could fetch files from the master copy. This may even be the suggested method if there are several backup servers that need to be updated asynchronously.
There's more...
If we check the manual page for rsync
, we can see that there are several other highly useful switches that can modify or control the data stream. There are a lot of useful switches to discuss.
Speeding up rsync
The rsync
command operates in real time, but before it starts, it makes a list of all files that need to be copied. Since the database is running while we are copying its contents, new files may appear during the backup that will not get copied, and old files may vanish. The best way to use rsync
is to actually run it multiple times with an extra parameter as shown in the following commands:
$> rsync -a -v -z /db/ postgres@remote_server::db_sync $> rsync -a -v -z --delete-after /db/ postgres@remote_server::db_sync
The first command may not complete for several hours depending on the database size, so the second should be executed once or twice to drastically shorten the number of changed files we need to copy.
Afterwards, it's good practice to shut down the database, and do one final sync. This guarantees that no files have changed and the backup is fully valid. Because we ran earlier synchronizations, downtime should be very short.
Whole file sync
PostgreSQL keeps most table data in 1 GB chunks. To avoid recopying data, rsync
will normally compute differences between files on each server. This can actually take a very long time for such large files. If enough bandwidth is available, it may actually be easier to simply transfer the whole file than the differing parts using the following command:
$> rsync -W -a -v -z /db/ postgres@remote_server::db_sync
The -W
flag does exactly this.
Exclude transaction logs
If we set up a backup copy for replication, we do not need the pg_xlog
directory, so we can exclude that as well.
$> rsync -a -v -z --exclude=pg_xlog /db/ \ postgres@remote_server::db_sync