Performing simple dumps
Backups and data exports are important because, without backups, you are effectively putting your database at risk in the case of crashes and storage-related issues. If you are running through a PostgreSQL setup, there are basically two main methods of performing backups:
- Logical dumps (extracting a SQL script that represents your data)
- Transaction log shipping
The idea behind transaction log shipping is to archive binary changes made to the database. Most people claim that transaction log shipping is the only real way to create backups. However, in my opinion, this is not necessarily true.
Many people rely on pg_dump
to simply extract a textual representation of the data. Interestingly, pg_dump
is also the oldest method of creating a backup and has been around since the very early days of the PostgreSQL project (transaction log shipping was added much later). Every PostgreSQL administrator becomes familiar with pg_dump
sooner or later, so it is important to know how it really works and what it does.
Running pg_dump
In the first part of this section, you will learn some basic things about pg_dump
. The first thing we want to do is create a simple textual dump, as shown in the following code block:
[hs@linuxpc ~]$ pg_dump test > /tmp/dump.sql
This is the most simplistic backup you can imagine. Basically, pg_dump
logs in to the local database instance, connects to a database called test
, and starts to extract all of the data, which will then be sent to stdout
and redirected to the file. The beauty here is that the standard output gives you all of the flexibility of a Unix system. You can easily compress the data using a pipe or do whatever you want to do with it.
In some cases, you might want to run pg_dump
as a different user. All PostgreSQL client programs support a consistent set of command-line parameters to pass the user information. If you just want to set the user, use the -U
flag, as follows:
[hs@linuxpc ~]$ pg_dump -U whatever_powerful_user test > /tmp/dump.sql
The following set of parameters can be found in all PostgreSQL client programs:
... Connection options: -d, --dbname=DBNAME database to dump -h, --host=HOSTNAME database server host or socket directory -p, --port=PORT database server port number -U, --username=NAME connect as specified database user -w, --no-password never prompt for password -W, --password force password prompt (should happen automatically) --role=ROLENAME do SET ROLE before dump ...
You can just pass the information you want to pg_dump
, and if you have the right level of permission, PostgreSQL will fetch the data. The important thing here is to see how the program really works. Basically, pg_dump
connects to the database and opens a large, repeatable read transaction that simply reads all of the data committed before the start of the transaction. Remember, a repeatable read ensures that PostgreSQL creates a consistent snapshot of the data, which does not change throughout the transactions. In other words, a dump is always consistent—no foreign keys will be violated. The output is a snapshot of data as it was when the dump started. Consistency is a key factor here. It also implies that changes made to the data when the dump is running won’t make it to the backup anymore.
Important note
A dump simply reads everything—therefore, there are no separate permissions to be able to dump something. As long as you can read it, you can back it up.
Also, note that the backup is in a textual format by default. This means that you can safely extract data from, say, Solaris, and move it to another CPU architecture. In the case of binary copies, this is clearly not possible, as the on-disk format depends on your CPU architecture.
Passing passwords and using the service file
If you take a close look at the connection parameters shown in the previous section, you will notice that there is no way to pass a password to pg_dump
. You can enforce a password prompt, but you cannot pass the parameter to pg_dump
using a command-line option.
The reason for this is simply because the password might show up in the process table and be visible to other people. The question now is this: If pg_hba.conf
, which is on the server, enforces a password, how can the client program provide it?
There are various means of doing this. Here are three:
- Using environment variables
- Using
.pgpass
to pass connection information - Using service files
In this section, we will learn about all three methods.
Using environment variables
One way to pass all kinds of parameters is to use environment variables. If information is not explicitly passed to pg_dump
, it will look for the missing information in predefined environment variables. A list of all potential settings can be found at https://www.postgresql.org/docs/17/static/libpq-envars.html.
The following overview shows some of the environment variables that are commonly needed for backups:
PGHOST
: This tells the system which host to connect toPGPORT
: This defines the TCP port to be usedPGUSER
: This tells a client program about the desired userPGPASSWORD
: This contains the password to be usedPGDATABASE
: This is the name of the database to connect to
The advantage of these environments is that the password won’t show up in the process table. However, there’s more.
Consider the following example:
psql -U ... -h ... -p ... -d ...
Given that you are a system administrator, would you really want to type a long piece of code such as this a couple of times every day? If you are working with the very same host again and again, just set those environment variables and connect with plain SQL. The following listing shows how to connect to the database by using environment variables to control the desired behavior:
[hs@linuxpc ~]$ export PGHOST=localhost [hs@linuxpc ~]$ export PGUSER=hs [hs@linuxpc ~]$ export PGPASSWORD=abc [hs@linuxpc ~]$ export PGPORT=5432 [hs@linuxpc ~]$ export PGDATABASE=test [hs@linuxpc ~]$ psql psql (17.0) Type "help" for help.
As you can see, there are no command-line parameters anymore. Just type psql
and you are in.
Important note
All applications based on the standard PostgreSQL C language client library (libpq
) will understand these environment variables, so you can use them not only for psql
and pg_dump
but also many other applications.
Using .pgpass to pass connection information
A very common way to store login information is via the use of the .pgpass
files. The idea is simple: Put a file called .pgpass
into your home directory and put your login information there. The format is simple. The following listing contains the basic format:
hostname:port:database:username:password
An example would be the following:
192.168.0.45:5432:mydb:xy:abc
PostgreSQL offers some nice additional functionality, in which most fields can contain *
.
Here is an example:
*:*:*:xy:abc
The *
character implies that on every host, on every port, for every database, the user called xy
will use abc
as the password. To make PostgreSQL use the .pgpass
file, make sure that the right file permissions are in place. Without the following lines, things won’t work properly:
chmod 0600 ~/.pgpass
chmod
sets the file-level permissions. These are necessary to protect the files. Furthermore, .pgpass
can also be used on a Windows system. In that case, the file can be found in the %
APPDATA%\postgresql\pgpass.conf
path.
Using service files
However, .pgpass
is not the only file you can use. You can also make use of service files. Here’s how they work: if you want to connect to the very same servers over and over again, you can create a .pg_service.conf
file. It will hold all of the connection information you need.
Here is an example of a .
pg_service.conf
file:
Mac:~ hs$ cat .pg_service.conf # a sample service [hansservice] host=localhost port=5432 dbname=test user=hs password=abc [paulservice] host=192.168.0.45 port=5432 dbname=xyz user=paul password=cde
To connect to one of the services, just set the environment and connect the following:
iMac:~ hs$ export PGSERVICE=hansservice
A connection can now be established without passing parameters to psql
:
iMac:~ hs$ psql psql (17.0) Type "help" for help. test=#
As you can see, the login works without additional command-line parameters. Alternatively, you can use the following command:
psql service=hansservice
Now that we have learned how to pass passwords and connection information, let’s move on to learning how to extract subsets of data.
Extracting subsets of data
So far, we have seen how to dump an entire database. However, this may not be what we want to do. In many cases, we just want to extract a subset of tables or schemas. Fortunately, pg_dump
can help us to do that while also providing several switches:
-a
: This only dumps the data and does not dump the data structure-s
: This dumps the data structure but skips the data-n
: This only dumps a certain schema-N
: This dumps everything but excludes certain schemas-t
: This only dumps certain tables-T
: This dumps everything but certain tables (this makes sense if you want to exclude logging tables and so on)
Partial dumps can be very useful to speed things up considerably. Now that we have learned how to perform simple dumps, let’s learn how to handle various file formats.