Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Instant PostgreSQL Backup and Restore How-to

You're reading from   Instant PostgreSQL Backup and Restore How-to A step-by-step guide to backing up and restoring your database using safe, efficient, and proven recipes

Arrow left icon
Product type Paperback
Published in Mar 2013
Publisher Packt
ISBN-13 9781782169109
Length 54 pages
Edition 1st Edition
Languages
Arrow right icon
Toc

Synchronizing backup servers (Intermediate)


Binary backups with tar create an intermediate file for archival purposes. Instead, it might be better to back up directly to another server to have a full running backup. To do this, most administrators rely on rsync, which remotely synchronizes data between two servers. For low volume servers, this can be much faster than tar as well.

Getting ready

Please refer to the Getting a basic export (Simple) recipe on preparing a sample database.

Our server will be running in the /db directory on both systems. Make sure the path of the directory exists and is owned by the postgres user. On the remote server that will be receiving the backup, create a file named /etc/rsyncd.conf with the following content:

[db_sync]
    path = /db
    comment = DB Backup Server
    uid = postgres
    gid = postgres
    read only = false
    use chroot = true

There are other available options for securing the remote server, but for now, we will ignore those options for the purposes of demonstration.

How to do it...

Creating an rsync backup comes in three basic steps:

  1. Begin by putting the database in backup mode:

    $> psql -c "SELECT pg_start_backup('label');" postgres
    
  2. Next, use the rsync command to synchronize the contents of the two servers:

    $> rsync -a -v -z /db/ postgres@remote_server::db_sync
    
  3. Finally, end backup mode:

    $> psql -c "SELECT pg_stop_backup();" postgres
    

How it works...

In our initial steps, we configure the rsync daemon with a module to accept file transfers. It's configured to run as the postgres user by default since that is a very common setup. We also enable symbolic links because PostgreSQL uses them extensively in tablespaces. We do not want to diverge too far from our copied server, so all paths should remain the same if possible.

As with a tar backup, we tell PostgreSQL that we're starting a backup. This is not strictly necessary since we're making an exact copy of the database as it runs, but is a good practice.

Next, we use rsync to physically copy the data to the remote server. We add the -a flag to copy all data attributes such as file ownership and permissions. The -v flag is simply to increase verbosity so we can watch the copy progress. Again, such a flag is not necessary but useful for illustrative purposes.

Tip

To shorten the duration of the backup and lower the amount of transaction log files, you will need to track and execute the same command before starting backup mode. The rsync command will then only copy changed files since the first synchronization.

Finally, we stop the backup. If we never entered PostgreSQL's backup mode, we don't need to end it either. In reality, the entire backup can be done with the rsync command alone.

Tip

The rsync command does not need to work on a push model. Instead, the rsyncd.conf file could reside on the server itself, and backup servers could fetch files from the master copy. This may even be the suggested method if there are several backup servers that need to be updated asynchronously.

There's more...

If we check the manual page for rsync, we can see that there are several other highly useful switches that can modify or control the data stream. There are a lot of useful switches to discuss.

Speeding up rsync

The rsync command operates in real time, but before it starts, it makes a list of all files that need to be copied. Since the database is running while we are copying its contents, new files may appear during the backup that will not get copied, and old files may vanish. The best way to use rsync is to actually run it multiple times with an extra parameter as shown in the following commands:

$> rsync -a -v -z /db/ postgres@remote_server::db_sync
$> rsync -a -v -z --delete-after /db/ postgres@remote_server::db_sync

The first command may not complete for several hours depending on the database size, so the second should be executed once or twice to drastically shorten the number of changed files we need to copy.

Afterwards, it's good practice to shut down the database, and do one final sync. This guarantees that no files have changed and the backup is fully valid. Because we ran earlier synchronizations, downtime should be very short.

Whole file sync

PostgreSQL keeps most table data in 1 GB chunks. To avoid recopying data, rsync will normally compute differences between files on each server. This can actually take a very long time for such large files. If enough bandwidth is available, it may actually be easier to simply transfer the whole file than the differing parts using the following command:

$> rsync -W -a -v -z /db/ postgres@remote_server::db_sync

The -W flag does exactly this.

Exclude transaction logs

If we set up a backup copy for replication, we do not need the pg_xlog directory, so we can exclude that as well.

$> rsync -a -v -z --exclude=pg_xlog /db/ \
  postgres@remote_server::db_sync
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime