Database server files are initially stored in a location referred to as the data directory. Additional data files may also be stored in tablespaces if any exist.
In this recipe, you will learn how to find the location of these directories on a given database server.
Getting ready
You'll need to get operating system access to the database system, which is what we call the platform on which the database runs.
How to do it...
If you can connect using psql, then you can use this command:
postgres=# SHOW data_directory;
data_directory
----------------------
/var/lib/pgsql/data/
If not, the following are the system default data
directory locations:
- Debian or Ubuntu systems:
/var/lib/postgresql/MAJOR_RELEASE/main
- Red Hat RHEL, CentOS, and Fedora:
/var/lib/pgsql/data/
- Windows:
C:\Program Files\PostgreSQL\MAJOR_RELEASE\data
MAJOR_RELEASE
is composed of just one number (for release 10 and above) or two (for releases up to 9.6).
On Debian or Ubuntu systems, the configuration files are located in /etc/postgresql/MAJOR_RELEASE/main/
, where main
is just the name of a database server. Other names are also possible. For the sake of simplicity, we assume that you only have a single installation, although the point of including the release number and database server name as components of the directory path is to allow multiple database servers to coexist on the same host.
Note
The pg_lsclusters
utility is specific to Debian/Ubuntu and displays a list of all the available database servers, including information for each server.
The information for each server includes the following:
- Major release number
- Port
- Status (for example, online and down)
- Data directory
- Log file
The pg_lsclusters
utility is part of the postgresql-common
Debian/Ubuntu package, which provides a structure under which multiple versions of PostgreSQL can be installed, and multiple clusters can be maintained, at the same time.
In the packages distributed with Red Hat RHEL, CentOS, and Fedora, the default data directory location also contains the configuration files (*.conf
) by default. However, note that the packages distributed by the PostgreSQL community use a different default location: /var/lib/pgsql/MAJOR_RELEASE/data/
.
Again, that is just the default location. You can create additional data directories using the initdb
utility.
The initdb
utility populates the given data directory with the initial content. The directory will be created for convenience if it is missing but, for safety, the utility will stop if the data
directory is not empty. The initdb
utility will read the data
directory name from the PGDATA
environment variable unless the -d
command-line option is used.
How it works...
Even though the Debian/Ubuntu and Red Hat file layouts are different, they both follow the Linux Filesystem Hierarchy Standard (FHS), so neither layout is wrong.
The Red Hat layout is simpler and easier to understand. The Debian/Ubuntu layout is more complex, but it has different and more adventurous goals. The Debian/Ubuntu layout is similar to the Optimal Flexible Architecture (OFA) of other database systems. As pointed out earlier, the goals are to provide a file layout that will allow you to have multiple PostgreSQL database servers on one system and to allow many versions of the software to exist in the filesystem at once.
Again, the layouts for the Windows and OS X installers are different. Multiple database clusters are possible, but they are also more complex than on Debian/Ubuntu.
I recommend that you follow the Debian/Ubuntu layout on whichever platform you are using. It doesn't really have a name, so I call it the PostgreSQL Flexible Architecture (PFA). Clearly, if you are using Debian or Ubuntu, then the Debian/Ubuntu layout is already being used. If you do this on other platforms, you'll need to lay things out yourself, but it does pay off in the long run. To implement PFA, you can set the following environment variables to name parts of the file layout:
export PGROOT=/var/lib/pgsql/
export PGRELEASE=14
export PGSERVERNAME=mamba
export PGDATA=$PGROOT/$PGRELEASE/$PGSERVERNAME
In this example, PGDATA
is /var/lib/pgsql/14/mamba
.
Finally, you must run initdb
to initialize the data
directory, as noted earlier, and custom administration scripts should be prepared to automate actions, such as starting or stopping the database server, when the system undergoes similar procedures.
Note that server applications such as initdb
can only work with one major PostgreSQL version. On distributions that allow several major versions, such as Debian or Ubuntu, these applications are placed in dedicated directories, which are not put in the default command path. This means that if you just type initdb
, the system will not find the executable, and you will get an error message.
This may look like a bug, but in fact, it is the desired behavior. Instead of accessing initdb
directly, you are supposed to use the pg_createcluster
utility from postgresql-common
, which will select the right initdb
utility depending on the major version you specify.
Note
If you plan to run more than one database server on the same host, you must set the preceding variables differently for each server as they determine the name of the data
directory. For instance, you can set them in the script that you use to start or stop the database server, which would be enough because PGDATA
is mostly used only by the database server process.
There's more…
Once you've located the data directory, you can look for the files that comprise the PostgreSQL database server. The layout is as follows:
Figure 2.2 – Contents of the PostgreSQL data directory
None of the aforementioned directories contain user-modifiable files, nor should any of the files be manually deleted to save space, or for any other reason. Don't touch it, because you'll break it, and you may not be able to fix it! It's not even sensible to copy files in these directories without carefully following the procedures described in Chapter 11, Backup and Recovery. Keep off the grass!
We'll talk about tablespaces later in the book. We'll also discuss a performance enhancement that involves putting the transaction log on its own set of disk drives in Chapter 10, Performance and Concurrency.
The only things you are allowed to touch are configuration files, which are all *.conf
files, and server message log files. Server message log files may or may not be in the data directory. For more details on this, refer to the next recipe, Locating the database server's message log.