Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
PostgreSQL for Data Architects

You're reading from   PostgreSQL for Data Architects Discover how to design, develop, and maintain your database application effectively with PostgreSQL

Arrow left icon
Product type Paperback
Published in Mar 2015
Publisher
ISBN-13 9781783288601
Length 272 pages
Edition 1st Edition
Arrow right icon
Author (1):
Arrow left icon
Jayadevan M Jayadevan M
Author Profile Icon Jayadevan M
Jayadevan M
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Installing PostgreSQL FREE CHAPTER 2. Server Architecture 3. PostgreSQL – Object Hierarchy and Roles 4. Working with Transactions 5. Data Modeling with SQL Power Architect 6. Client Tools 7. SQL Tuning 8. Server Tuning 9. Tools to Move Data in and out of PostgreSQL 10. Scaling, Replication, and Backup and Recovery 11. PostgreSQL – Troubleshooting 12. PostgreSQL – Extras Index

Initializing a cluster

First we add an OS user. This user will be used to start/stop/restart the database. This user will also be the superuser for the cluster. The following command has to be executed as root or with sudo:

adduser postgres

Tip

The new user need not be named postgres. It can be mydbadmin, mydba, or anything we fancy.

Next, we create a directory that will be the base directory for the new cluster. This could be anywhere. A standard location can be /usr/local/pgsql/data. However, you might want to have the database cluster on a separate partition. In the event of an OS and associated file system crash, your database data remains intact. It can also be that you want to use faster spinning disks or Solid State Disks for the database cluster to improve performance. In short, performance and/or reliability concerns can make you choose a location other than the default location to initialize the database cluster. As root, we execute the commands:

[root@MyCentOS extension]# mkdir -p /pgdata/9.3

The -p option ensures that the parent directory is also created if it is nonexistent:

[root@MyCentOS extension]# chown postgres /pgdata/9.3

Then, we switch user to postgres. This is necessary because when we initialize the cluster, the user under which the command was executed becomes the owner of the cluster. The server process will also be owned by this user. We will go with the standard user –postgres, which we created:

su - postgres

The next step is to run the initdb script. Well! Not exactly. If we run initdb now, we will get an error:

[postgres@MyCentOS ~]$ initdb
-bash: initdb: command not found

This is because we haven't added the directory containing the PostgreSQL executables to the environment variable PATH yet. We could provide the absolute path and make it work. However, it is better to set the variables.

In .bash_profile of postgres, we have the following lines:

PATH=$PATH:$HOME/bin
export PATH

Just before the export PATH line, add:

PATH=$PATH:/usr/local/pgsql/bin

Then, try this:

[postgres@MyCentOS ~]$ which initdb
/usr/bin/which: no initdb in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/postgres/bin)

Not surprising, as .bash_profile doesn't get executed unless we source it or log out and log in. Log out, log in, and try again:

[postgres@MyCentOS ~]$ exit
logout
[root@MyCentOS ~]# su - postgres
[postgres@MyCentOS ~]$ which initdb
/usr/local/pgsql/bin/initdb

Now we are good to go! It's a good idea to execute the following:

initdb --help | more

Among the many parameters available, the important ones, in most cases, will be -D or --pgdata. This parameter is used to define the directory where the cluster will be initialized (where the database cluster should be stored). This is the only mandatory parameter. Another parameter that can be useful is --pwprompt. Using this, we can set the password for the database superuser. So, we execute the following command:

initdb --pgdata=/pgdata/9.3 --pwprompt

If this is not set now and password authentication is to be used, we have to set the password later, as shown here:

Initializing a cluster

As seen in the preceding screenshot, the process asks for the superuser password. Towards the end, it gives a warning that the trust authentication is enabled for local connections. This means that it will be possible to make connections from the localhost without being prompted for a password. It's a good idea to change this setting. We will come to this later. For more options available for initdb, please refer to http://www.postgresql.org/docs/current/static/app-initdb.html.

As always, let's see what happened; which directories got created when we initialized the cluster:

cd /pgdata/9.3
[postgres@MyCentOS 9.3]$ find ./ -maxdepth 1 -type d
./
./base
./pg_stat
./pg_clog
./pg_xlog
./pg_tblspc
./pg_twophase
./pg_subtrans
./global
./pg_notify
./pg_stat_tmp
./pg_snapshots
./pg_multixact
./pg_serial

A quick walk through the directories

We will start the cluster as it will help us to relate the files system with the databases:

[postgres@MyCentOS base]$ expo
rt PGDATA=/pgdata/9.3/

PGDATA is the default data directory location.

[postgres@MyCentOS base]$ pg_ctl start
server starting
[postgres@MyCentOS base]$ LOG:  database system was shut down at 2013-10-13 13:48:07 IST
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

The pg_ctl is a utility to start, stop, check the status, or restart the database cluster. Passing init or initdb as a parameter results in the initialization of a cluster. More options can be explored at http://www.postgresql.org/docs/current/static/app-pg-ctl.html.

Now, we will go through the directories:

  • base: This directory holds the databases that will be created by database users. This also holds the pg_defaulttablespace with the databases: postgres, template0, and template1:
    [postgres@MyCentOS base]$ pwd
    /pgdata/9.3/base
    [postgres@MyCentOS base]$ find ./ -type d
    ./
    ./1
    ./12891
    ./12896
    [postgres@MyCentOS base]$ oid2name
    All databases:
    Oid  Database Name  Tablespace
    ----------------------------------
      12896       postgrespg_default
      12891      template0pg_default
          1      template1  pg_default
    
    • We can see here that Oid points to the directory names. We can try creating and dropping a database, as shown in the following code:
      [postgres@MyCentOS base]$ pwd
      /pgdata/9.3/base
      [postgres@MyCentOS base]$ psql
      psql (9.3.0)
      Type "help" for help.
      
      postgres=# \! ls
      1  12891  12896
      postgres=# CREATE DATABASE test;
      CREATE DATABASE
      postgres=# \! ls
      1  12891  12896  16385
      postgres=# DROP DATABASE test;
      DROP DATABASE
      postgres=# \! ls
      1  12891  12896
      

      Tip

      OID stands for Object Identifier. They are used internally by PostgreSQL as primary keys for various system tables. Objects created in the cluster (tables, databases, and so on) will have OIDs associated with them. The oid2name utility is available as a contrib module and helps us examine the databases, tables, and related file nodes.

    • We note a few things here. First, we were not prompted for a password (remember the warning about trust authentication?). Next, we can execute host commands from psql (more on this in a later chapter). Third, creating/dropping databases are actually similar to creating/deleting directories. PostgreSQL does do quite a bit of internal book-keeping when we create or drop objects in addition to manipulating directories.
  • global: This directory contains cluster-wide tables. There are many tables and associated views that keep track of the entire cluster, namely, database roles, system catalog data, and so on.
  • pg_clog: This directory contains the transaction commit status data.
  • pg_multixact: This directory contains multitransaction status data (concurrent transactions waiting for a lock).
  • pg_notify: This directory contains the LISTEN/NOTIFY status data.
  • pg_serial: This directory contains information about committed serializable transactions.
  • pg_snapshots: This directory contains exported snapshots.
  • pg_stat_tmp: This directory contains temporary files for the statistics subsystem.
  • pg_subtrans: This directory contains subtransaction status data.
  • pg_tblspc: This directory contains symbolic links to tablespaces.
  • pg_twophase: This directory contains state files for prepared transactions.
  • pg_xlog: This directory contains Write Ahead Log (WAL) files.

In short, we have directories to hold containers of real user created data (tables and tablespaces), directories to hold data about all the data and data structures (metadata), and then directories to hold data about the state of transactions.

Processes created

Let's take a look at the processes spawned when we started PostgreSQL:

[root@MyCentOS ~]# ps  -fupostgres

The following screenshot illustrates the processes spawned:

Processes created

We can see that there is one parent process (pid1566), which spawned a few child processes.

Important files created

One important file that gets created when we initialize a database cluster is postgresql.conf. This file contains a number of critical parameters related to the server processes and resource allocation, memory management, logging, file locations, and so on. Another file is pg_hba.conf."hba" (which stands for host-based authentication). Changes to these files are necessary to enable client connection to the database from a different machine. Both of these are in the PGDATA folder.

The postmaster.pid file in the same directory is used by pg_ctl to determine whether the database is running:

[postgres@MyCentOS 9.3]$ pg_ctl status
pg_ctl: server is running (PID: 1566)

The contents of the file are as follows:

[postgres@MyCentOS 9.3]$ head -1 postmaster.pid
1566

The 1566 number in the pid file is the same as what we got for the parent process when we did a process listing earlier.

You have been reading a chapter from
PostgreSQL for Data Architects
Published in: Mar 2015
Publisher:
ISBN-13: 9781783288601
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image