Taking snapshots (Advanced)
Filesystem snapshots can provide a better-integrity guarantee for PostgreSQL backups, as files on a snapshot do not change during the backup. In Linux systems, the Linux Volume Manager (LVM) provides this capability. On BSD or specially altered Linux systems, ZFS is also available. On expensive SAN systems, this is often a built-in function. Due to its availability, we will cover LVM in this recipe.
Getting ready
Let's assume we have a 500 GB raw block device provided as /dev/sdb
. This can be from an internal RAID or SAN management's LUN storage, or even just an internal disk. The following commands will make the /db
path available to initialize as a new database:
$> pvcreate /dev/sdb $> vgcreate db_group /dev/sdb $> lvcreate -L 450G -n db_volume db_group $> mkfs -t ext4 /dev/db_group/db_volume $> mount /dev/db_group/db_volume /db
Please refer to the Getting a basic export (Simple) recipe on preparing a sample
database.
How to do it...
With LVM in place, there are several new commands available to manage the volume itself. Let's back up the entire contents of the /db
directory within a database snapshot using the following steps:
Start by creating a new snapshot volume:
$> lvcreate -s -L 40G -n db_snap db_group/db_volume
Next, mount the directory so
tar
can copy the database files:$> mkdir /mnt/snap $> mount /dev/db_group/db_snap /mnt/snap
Now, perform a
tar
backup:$> tar -C /mnt/snap -c -z -f /backup/db_backup.tar.gz .
Finally, unmount and destroy the snapshot volume:
$> umount /mnt/snap $> lvremove -f db_group/db_snap
How it works...
The process itself is basically a wrapper for whatever backup method we want to choose. During the setup, we create a physical volume that represents the actual device. On top of that goes the volume
group, which allows us to group several devices or volumes together. Lastly comes the logical volume
itself. We only allocate 450 GB of the total size, so we can use up to 50 GB for snapshots.
When a snapshot volume is created (lvcreate -s
), it is tied to the named volume, and uses space in the same volume
group. In this case, we used 40 GB of the available 50 GB. While the snapshot volume exists, any changed blocks on the source volume are written to this space instead. This means our example will allow up to 40 GB of data to change before the filesystem notices that the space is exhausted and it automatically deallocates the volume.
When we mount the snapshot volume, we actually see an unchanged copy of the /db
directory, mounted at /mnt/snap
instead. Our running applications see /db
normally, so they can keep operating without interruption. This is why we direct the backup tar
command to the snapshot mount and back up everything inside. This data is frozen in time, as if the database had simply stopped operating.
When the backup is finished, we simply unmount and remove the volume. This prevents possible problems from LVM destroying a full snapshot, and lets us reuse the space in subsequent backup operations.
This kind of backup doesn't need the pg_start_backup
or pg_stop_backup
commands, because no new transaction logs can be written in the snapshot, and nothing can be archived. The files we backed up can be extracted elsewhere and started as a separate database as if it had crashed while operating.
There's more...
It's easy to see that snapshots are very powerful, but there are several elements to keep in mind while using them.
Proper snapshot sizing
Why did we choose 40 GB for our snapshot volume? In our case, this was just an illustrative amount. In a real scenario, we would want to carefully calculate suitable snapshot sizes. This might simply be multiplying the average data turnover by the usual back up duration, and doubling that amount. We might also back up during a less active time of day (such as midnight) with a semi-large snapshot.
The point to remember is that the snapshot supplies all necessary space for any changed blocks on any file in the source volume, or any new files generated, such as transaction logs. The easiest way to see how much of this space is used, is to call some other LVM commands using the following command:
$> lvdisplay db_group/db_snap
The section entitled Allocated to snapshot shows how much of the volume is being used. Creating and monitoring a temporary snapshot is a good way to gauge the rate of data turnover before scheduling a real snapshot backup.
In our case, we reserved 10 percent of the total volume size, which is a good starting point. It is always possible to return excess space to the database volume with lvextend
and resize2fs
. To return 10 GB of space to the db_volume
volume, we would adjust it while it is unmounted using the following commands:
$> umount /db $> lvextend -L +10G db_group/db_volume $> resize2fs /dev/db_group/db_volume
It is worth mentioning that ZFS has features that negate a lot of the work we performed previously. Unfortunately, properly setting up a good ZFS filesystem is a non-trivial exercise. We highly recommend obtaining a book that outlines how to best use this filesystem if you plan on utilizing this approach.
Snapshot backup targets
It may be tempting to back up the database to the source volume, as it's likely very large. This inclination must be ignored, however! Remember that any block written in the source volume comes from the snapshot. For very large databases, this could require hundreds of GB of space for the backup alone, ignoring any changes of the database itself.
Best practice suggests directing a backup to another device entirely. Larger systems commonly have a separate backup mount such as a NAS or SAN device where backups can reside in a vault area before being committed to tape for long-term storage. Not only is this safer for a snapshot, but much faster as well, since the backup is not stealing read or write throughput from the source volume.
The assumption through all of these examples is that /backup
is a source volume. This is a good idea for all backup types, but is especially necessary for snapshot backups to avoid IO contention and snapshot destruction.
Each tablespace is likely to be held on a separate volume, and each would need a separate snapshot command. Even if this is done virtually simultaneously, there is a slight risk of a race condition between the different times for which each of the snapshots were frozen, leaving the data files backed up in an undetermined state. Any tables knocked out of sync due to this should be saved by replaying archive logs during recovery, but be wary of the limitations in this approach.