Initializing a cluster
First we add an OS user. This user will be used to start/stop/restart the database. This user will also be the superuser for the cluster. The following command has to be executed as root or with sudo
:
adduser postgres
Tip
The new user need not be named postgres
. It can be mydbadmin
, mydba
, or anything we fancy.
Next, we create a directory that will be the base directory for the new cluster. This could be anywhere. A standard location can be /usr/local/pgsql/data
. However, you might want to have the database cluster on a separate partition. In the event of an OS and associated file system crash, your database data remains intact. It can also be that you want to use faster spinning disks or Solid State Disks for the database cluster to improve performance. In short, performance and/or reliability concerns can make you choose a location other than the default location to initialize the database cluster. As root, we execute the commands:
[root@MyCentOS extension]# mkdir -p /pgdata/9.3
The -p
option ensures that the parent directory is also created if it is nonexistent:
[root@MyCentOS extension]# chown postgres /pgdata/9.3
Then, we switch user to postgres
. This is necessary because when we initialize the cluster, the user under which the command was executed becomes the owner of the cluster. The server process will also be owned by this user. We will go with the standard user –postgres
, which we created:
su - postgres
The next step is to run the initdb
script. Well! Not exactly. If we run initdb
now, we will get an error:
[postgres@MyCentOS ~]$ initdb -bash: initdb: command not found
This is because we haven't added the directory containing the PostgreSQL executables to the environment variable PATH
yet. We could provide the absolute path and make it work. However, it is better to set the variables.
In .bash_profile
of postgres
, we have the following lines:
PATH=$PATH:$HOME/bin export PATH
Just before the export PATH
line, add:
PATH=$PATH:/usr/local/pgsql/bin
Then, try this:
[postgres@MyCentOS ~]$ which initdb /usr/bin/which: no initdb in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/postgres/bin)
Not surprising, as .bash_profile
doesn't get executed unless we source it or log out and log in. Log out, log in, and try again:
[postgres@MyCentOS ~]$ exit logout [root@MyCentOS ~]# su - postgres [postgres@MyCentOS ~]$ which initdb /usr/local/pgsql/bin/initdb
Now we are good to go! It's a good idea to execute the following:
initdb --help | more
Among the many parameters available, the important ones, in most cases, will be -D
or --pgdata
. This parameter is used to define the directory where the cluster will be initialized (where the database cluster should be stored). This is the only mandatory parameter. Another parameter that can be useful is --pwprompt
. Using this, we can set the password for the database superuser. So, we execute the following command:
initdb --pgdata=/pgdata/9.3 --pwprompt
If this is not set now and password authentication is to be used, we have to set the password later, as shown here:
As seen in the preceding screenshot, the process asks for the superuser password. Towards the end, it gives a warning that the trust authentication is enabled for local connections. This means that it will be possible to make connections from the localhost without being prompted for a password. It's a good idea to change this setting. We will come to this later. For more options available for initdb
, please refer to http://www.postgresql.org/docs/current/static/app-initdb.html.
As always, let's see what happened; which directories got created when we initialized the cluster:
cd /pgdata/9.3 [postgres@MyCentOS 9.3]$ find ./ -maxdepth 1 -type d ./ ./base ./pg_stat ./pg_clog ./pg_xlog ./pg_tblspc ./pg_twophase ./pg_subtrans ./global ./pg_notify ./pg_stat_tmp ./pg_snapshots ./pg_multixact ./pg_serial
A quick walk through the directories
We will start the cluster as it will help us to relate the files system with the databases:
[postgres@MyCentOS base]$ expo rt PGDATA=/pgdata/9.3/
PGDATA is the default data directory location.
[postgres@MyCentOS base]$ pg_ctl start server starting [postgres@MyCentOS base]$ LOG: database system was shut down at 2013-10-13 13:48:07 IST LOG: database system is ready to accept connections LOG: autovacuum launcher started
The pg_ctl
is a utility to start, stop, check the status, or restart the database cluster. Passing init
or initdb
as a parameter results in the initialization of a cluster. More options can be explored at http://www.postgresql.org/docs/current/static/app-pg-ctl.html.
Now, we will go through the directories:
base
: This directory holds the databases that will be created by database users. This also holds thepg_defaulttablespace
with the databases:postgres
,template0
, andtemplate1
:[postgres@MyCentOS base]$ pwd /pgdata/9.3/base [postgres@MyCentOS base]$ find ./ -type d ./ ./1 ./12891 ./12896 [postgres@MyCentOS base]$ oid2name All databases: Oid Database Name Tablespace ---------------------------------- 12896 postgrespg_default 12891 template0pg_default 1 template1 pg_default
- We can see here that
Oid
points to the directory names. We can try creating and dropping a database, as shown in the following code:[postgres@MyCentOS base]$ pwd /pgdata/9.3/base [postgres@MyCentOS base]$ psql psql (9.3.0) Type "help" for help. postgres=# \! ls 1 12891 12896 postgres=# CREATE DATABASE test; CREATE DATABASE postgres=# \! ls 1 12891 12896 16385 postgres=# DROP DATABASE test; DROP DATABASE postgres=# \! ls 1 12891 12896
Tip
OID stands for Object Identifier. They are used internally by PostgreSQL as primary keys for various system tables. Objects created in the cluster (tables, databases, and so on) will have OIDs associated with them. The
oid2name
utility is available as acontrib
module and helps us examine the databases, tables, and related file nodes. - We note a few things here. First, we were not prompted for a password (remember the warning about trust authentication?). Next, we can execute host commands from
psql
(more on this in a later chapter). Third, creating/dropping databases are actually similar to creating/deleting directories. PostgreSQL does do quite a bit of internal book-keeping when we create or drop objects in addition to manipulating directories.
- We can see here that
global
: This directory contains cluster-wide tables. There are many tables and associated views that keep track of the entire cluster, namely, database roles, system catalog data, and so on.pg_clog
: This directory contains the transaction commit status data.pg_multixact
: This directory contains multitransaction status data (concurrent transactions waiting for a lock).pg_notify
: This directory contains the LISTEN/NOTIFY status data.pg_serial
: This directory contains information about committed serializable transactions.pg_snapshots
: This directory contains exported snapshots.pg_stat_tmp
: This directory contains temporary files for the statistics subsystem.pg_subtrans
: This directory contains subtransaction status data.pg_tblspc
: This directory contains symbolic links to tablespaces.pg_twophase
: This directory contains state files for prepared transactions.pg_xlog
: This directory contains Write Ahead Log (WAL) files.
In short, we have directories to hold containers of real user created data (tables and tablespaces), directories to hold data about all the data and data structures (metadata), and then directories to hold data about the state of transactions.
Processes created
Let's take a look at the processes spawned when we started PostgreSQL:
[root@MyCentOS ~]# ps -fupostgres
The following screenshot illustrates the processes spawned:
We can see that there is one parent process (pid1566
), which spawned a few child processes.
Important files created
One important file that gets created when we initialize a database cluster is postgresql.conf
. This file contains a number of critical parameters related to the server processes and resource allocation, memory management, logging, file locations, and so on. Another file is pg_hba.conf."hba"
(which stands for host-based authentication). Changes to these files are necessary to enable client connection to the database from a different machine. Both of these are in the PGDATA
folder.
The postmaster.pid
file in the same directory is used by pg_ctl
to determine whether the database is running:
[postgres@MyCentOS 9.3]$ pg_ctl status pg_ctl: server is running (PID: 1566)
The contents of the file are as follows:
[postgres@MyCentOS 9.3]$ head -1 postmaster.pid 1566
The 1566
number in the pid
file is the same as what we got for the parent process when we did a process listing earlier.