Please refer to the Getting a basic export (Simple) recipe on preparing a sample
database.
Next we need to modify the postgresql.conf
file for our database to run in the proper mode for this type of backup. Change the following configuration variables:
Then we must allow a super user to connect to the replication
database, which is used by pg_basebackup
. We do that by adding the following line to pg_hba.conf
:
Finally, restart the database
instance to commit the changes.
For PostgreSQL, WAL stands for Write Ahead Log. By changing wal_level
to archive
, those logs are written in a format compatible with pg_basebackup
and other replication-based tools.
By increasing max_wal_senders
from the default of zero, the database will allow tools to connect and request data files. In this case, up to five streams can request data files simultaneously. This maximum should be sufficient for all but the most advanced systems.
The pg_hba.conf
file is essentially a connection access control list (ACL). Since pg_basebackup
uses the replication protocol to obtain data files, we need to allow local connections to request replication.
Next, we send the backup itself to a directory (-D
) named db_backup
. This directory will effectively contain a complete copy of the binary files that make up the database.
Finally, we added the -x
flag to include transaction logs (xlogs
), which the database will require to start, if we want to use this backup. When we get into more complex scenarios, we will exclude this option, but for now, it greatly simplifies the process.
The pg_basebackup
tool is actually fairly complicated. There is a lot more involved under the hood.
For manually invoked backups, we may want to know how long the process might take, and its current status. Luckily, pg_basebackup
has a progress indicator, which does that by using the following command:
Like many of the other switches, -P
can be combined with tape archive format, standalone backups, database clones, and so on. This is clearly not necessary for automated backup routines, but could be useful for one-off backups monitored by an administrator.
Compressed tape archive backups
Many binary backup files come in the TAR (Tape Archive) format, which we can activate using the -f
flag and setting it to t
for TAR. Several Unix backup tools can directly process this type of backup, and most administrators are familiar with it.
If we want a compressed output, we can set the -z
flag, especially in the case of large databases. For our sample
database, we should see almost a 20x compression ratio. Try the following command:
The backup file itself will be named base.tar.gz
within the db_backup
directory, reflecting its status as a compressed tape archive. In case the database contains extra tablespaces, each becomes a separate compressed archive. Each file can be extracted to a separate location, such as a different set of disks, for very complicated database instances.
For the sake of this example, we ignored the possible presence of extra tablespaces than the pg_default
default included in every installation. User-created tablespaces will greatly complicate your backup process.
Making the backup standalone
By specifying -x
, we tell the database that we want a "complete" backup. This means we could extract or copy the backup anywhere and start it as a fully qualified database. As we mentioned before, the flag means that you want to include transaction logs, which is how the database recovers from crashes, checks integrity, and performs other important tasks. The following is the command again, for reference:
When combined with the TAR output format and compression, standalone binary backups are perfect for archiving to tape for later retrieval, as each backup is compressed and self-contained. By default, pg_basebackup
does not include transaction logs, because many (possibly most) administrators back these up separately. These files have multiple uses, and putting them in the basic backup would duplicate efforts and make backups larger than necessary.
We include them at this point because it is still too early for such complicated scenarios. We will get there eventually, of course.
Because pg_basebackup
operates through PostgreSQL's replication protocol, it can execute remotely. For instance, if the database was on a server named Production
, and we wanted a copy on a server named Recovery
, we could execute the following command from Recovery
:
For this to work, we would also need this line in pg_hba.conf
for Recovery
:
Though we set the authentication method to trust
, this is not recommended for a production server installation. However, it is sufficient to allow Recovery
to copy all data from Production
. With the -x
flag, it also means that the database can be started and kept online in case of emergency. It is a backup and a running server.
Compression is very CPU intensive, but there are some utilities capable of threading the process. Tools such as pbzip2
or pigz
can do the compression instead. Unfortunately, this only works in the case of a single tablespace (the default one; if you create more, this will not work). The following is the command for compression using pigz
:
It uses four threads of compression, and sets the backup directory to standard output (-
) so that pigz
can process the output itself.