Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Mastering PostgreSQL 17

You're reading from   Mastering PostgreSQL 17 Elevate your database skills with advanced deployment, optimization, and security strategies

Arrow left icon
Product type Paperback
Published in Dec 2024
Publisher Packt
ISBN-13 9781836205975
Length 474 pages
Edition 6th Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Hans-Jürgen Schönig Hans-Jürgen Schönig
Author Profile Icon Hans-Jürgen Schönig
Hans-Jürgen Schönig
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Chapter 1: What is New in PostgreSQL 17 FREE CHAPTER 2. Chapter 2: Understanding Transactions and Locking 3. Chapter 3: Making Use of Indexes 4. Chapter 4: Handling Advanced SQL 5. Chapter 5: Log Files and System Statistics 6. Chapter 6: Optimizing Queries for Good Performance 7. Chapter 7: Writing Stored Procedures 8. Chapter 8: Managing PostgreSQL Security 9. Chapter 9: Handling Backup and Recovery 10. Chapter 10: Making Sense of Backups and Replication 11. Chapter 11: Deciding on Useful Extensions 12. Chapter 12: Troubleshooting PostgreSQL 13. Chapter 13: Migrating to PostgreSQL 14. Index 15. Other Books You May Enjoy

Performing simple dumps

Backups and data exports are important because, without backups, you are effectively putting your database at risk in the case of crashes and storage-related issues. If you are running through a PostgreSQL setup, there are basically two main methods of performing backups:

  • Logical dumps (extracting a SQL script that represents your data)
  • Transaction log shipping

The idea behind transaction log shipping is to archive binary changes made to the database. Most people claim that transaction log shipping is the only real way to create backups. However, in my opinion, this is not necessarily true.

Many people rely on pg_dump to simply extract a textual representation of the data. Interestingly, pg_dump is also the oldest method of creating a backup and has been around since the very early days of the PostgreSQL project (transaction log shipping was added much later). Every PostgreSQL administrator becomes familiar with pg_dump sooner or later, so it is important to know how it really works and what it does.

Running pg_dump

In the first part of this section, you will learn some basic things about pg_dump. The first thing we want to do is create a simple textual dump, as shown in the following code block:

[hs@linuxpc ~]$ pg_dump test > /tmp/dump.sql

This is the most simplistic backup you can imagine. Basically, pg_dump logs in to the local database instance, connects to a database called test, and starts to extract all of the data, which will then be sent to stdout and redirected to the file. The beauty here is that the standard output gives you all of the flexibility of a Unix system. You can easily compress the data using a pipe or do whatever you want to do with it.

In some cases, you might want to run pg_dump as a different user. All PostgreSQL client programs support a consistent set of command-line parameters to pass the user information. If you just want to set the user, use the -U flag, as follows:

[hs@linuxpc ~]$ pg_dump -U whatever_powerful_user test > /tmp/dump.sql

The following set of parameters can be found in all PostgreSQL client programs:

... Connection options:
  -d, --dbname=DBNAME database to dump
  -h, --host=HOSTNAME database
       server host or socket directory
  -p, --port=PORT database server port number
  -U, --username=NAME connect as specified database user
  -w, --no-password never prompt for password
  -W, --password force password prompt (should
                           happen automatically)
  --role=ROLENAME do SET ROLE before dump
...

You can just pass the information you want to pg_dump, and if you have the right level of permission, PostgreSQL will fetch the data. The important thing here is to see how the program really works. Basically, pg_dump connects to the database and opens a large, repeatable read transaction that simply reads all of the data committed before the start of the transaction. Remember, a repeatable read ensures that PostgreSQL creates a consistent snapshot of the data, which does not change throughout the transactions. In other words, a dump is always consistent—no foreign keys will be violated. The output is a snapshot of data as it was when the dump started. Consistency is a key factor here. It also implies that changes made to the data when the dump is running won’t make it to the backup anymore.

Important note

A dump simply reads everything—therefore, there are no separate permissions to be able to dump something. As long as you can read it, you can back it up.

Also, note that the backup is in a textual format by default. This means that you can safely extract data from, say, Solaris, and move it to another CPU architecture. In the case of binary copies, this is clearly not possible, as the on-disk format depends on your CPU architecture.

Passing passwords and using the service file

If you take a close look at the connection parameters shown in the previous section, you will notice that there is no way to pass a password to pg_dump. You can enforce a password prompt, but you cannot pass the parameter to pg_dump using a command-line option.

The reason for this is simply because the password might show up in the process table and be visible to other people. The question now is this: If pg_hba.conf, which is on the server, enforces a password, how can the client program provide it?

There are various means of doing this. Here are three:

  • Using environment variables
  • Using .pgpass to pass connection information
  • Using service files

In this section, we will learn about all three methods.

Using environment variables

One way to pass all kinds of parameters is to use environment variables. If information is not explicitly passed to pg_dump, it will look for the missing information in predefined environment variables. A list of all potential settings can be found at https://www.postgresql.org/docs/17/static/libpq-envars.html.

The following overview shows some of the environment variables that are commonly needed for backups:

  • PGHOST: This tells the system which host to connect to
  • PGPORT: This defines the TCP port to be used
  • PGUSER: This tells a client program about the desired user
  • PGPASSWORD: This contains the password to be used
  • PGDATABASE: This is the name of the database to connect to

The advantage of these environments is that the password won’t show up in the process table. However, there’s more.

Consider the following example:

psql  -U ... -h ... -p ... -d ...

Given that you are a system administrator, would you really want to type a long piece of code such as this a couple of times every day? If you are working with the very same host again and again, just set those environment variables and connect with plain SQL. The following listing shows how to connect to the database by using environment variables to control the desired behavior:

[hs@linuxpc ~]$ export PGHOST=localhost
[hs@linuxpc ~]$ export PGUSER=hs
[hs@linuxpc ~]$ export PGPASSWORD=abc
[hs@linuxpc ~]$ export PGPORT=5432
[hs@linuxpc ~]$ export PGDATABASE=test
[hs@linuxpc ~]$ psql
psql (17.0)
Type "help" for help.

As you can see, there are no command-line parameters anymore. Just type psql and you are in.

Important note

All applications based on the standard PostgreSQL C language client library (libpq) will understand these environment variables, so you can use them not only for psql and pg_dump but also many other applications.

Using .pgpass to pass connection information

A very common way to store login information is via the use of the .pgpass files. The idea is simple: Put a file called .pgpass into your home directory and put your login information there. The format is simple. The following listing contains the basic format:

hostname:port:database:username:password

An example would be the following:

192.168.0.45:5432:mydb:xy:abc

PostgreSQL offers some nice additional functionality, in which most fields can contain *.

Here is an example:

*:*:*:xy:abc

The * character implies that on every host, on every port, for every database, the user called xy will use abc as the password. To make PostgreSQL use the .pgpass file, make sure that the right file permissions are in place. Without the following lines, things won’t work properly:

chmod 0600 ~/.pgpass

chmod sets the file-level permissions. These are necessary to protect the files. Furthermore, .pgpass can also be used on a Windows system. In that case, the file can be found in the %APPDATA%\postgresql\pgpass.conf path.

Using service files

However, .pgpass is not the only file you can use. You can also make use of service files. Here’s how they work: if you want to connect to the very same servers over and over again, you can create a .pg_service.conf file. It will hold all of the connection information you need.

Here is an example of a .pg_service.conf file:

Mac:~  hs$ cat .pg_service.conf
# a sample service
[hansservice]
host=localhost
port=5432
dbname=test
user=hs
password=abc
[paulservice]
host=192.168.0.45
port=5432
dbname=xyz
user=paul
password=cde

To connect to one of the services, just set the environment and connect the following:

iMac:~ hs$ export PGSERVICE=hansservice

A connection can now be established without passing parameters to psql:

iMac:~ hs$ psql
psql (17.0)
Type "help" for help.
test=#

As you can see, the login works without additional command-line parameters. Alternatively, you can use the following command:

psql service=hansservice

Now that we have learned how to pass passwords and connection information, let’s move on to learning how to extract subsets of data.

Extracting subsets of data

So far, we have seen how to dump an entire database. However, this may not be what we want to do. In many cases, we just want to extract a subset of tables or schemas. Fortunately, pg_dump can help us to do that while also providing several switches:

  • -a: This only dumps the data and does not dump the data structure
  • -s: This dumps the data structure but skips the data
  • -n: This only dumps a certain schema
  • -N: This dumps everything but excludes certain schemas
  • -t: This only dumps certain tables
  • -T: This dumps everything but certain tables (this makes sense if you want to exclude logging tables and so on)

Partial dumps can be very useful to speed things up considerably. Now that we have learned how to perform simple dumps, let’s learn how to handle various file formats.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime