PostgreSQL 13 Cookbook

Cluster Management Techniques

This chapter consists of several recipes to c. We shall start this chapter with recipes that show some of the utilities available for creating and dropping databases. We will see how a database and a table can be located on a file system and then see how a schema can be created. We shall also understand the advantages of using schemas, along with the steps involved in assigning the ownership of a schema to a user. We'll then move on to discuss the methods involved in looking at the size of a table and an index in PostgreSQL.

Over a period of time, database activity may cause objects to grow huge in size. This is the time when we wish to move tables and indexes across different tablespaces to distribute the IOPS across multiple disks. Additionally, you may wish to create archive tables to store old data that is not heavily accessed but only kept for satisfying compliances. For this purpose, we shall discuss the steps involved in creating a tablespace and how to move a table to a tablespace.

We will then move on to user management in PostgreSQL, where we will see how to create and drop a user, how to assign and revoke a privilege from a user, and how to properly manage the segregation of privileges using roles.

Finally, we will end the chapter by discussing how the MVCC implementation is different in PostgreSQL, along with an introduction to VACUUM.

The following are the recipes that will be discussed in this chapter:

Creating and dropping databases
Locating a database and a table
Creating a schema
Checking table and index sizes
Creating tablespaces
Moving tables to a different tablespace

Creating a user
Dropping a user
Assigning and revoking privileges
Creating a group role for role-based segregation
MVCC implementation and VACUUM

Creating and dropping databases

So far, we have seen how to install PostgreSQL, initialize a PostgreSQL cluster, and start and stop a cluster. In this recipe, we shall discuss how to create or drop a database using the createdb and dropdb utilities.

There are multiple ways to create and drop a database in PostgreSQL. Upon successful installation of PostgreSQL, you have two utilities, called createdb and dropdb, that can be used to create and drop databases in PostgreSQL. The same can also be done using the psql utility.

Let's look at this in detail in these recipes.

Getting ready

In order to create or drop databases, we must either be a superuser or have the role CREATEDB. Also, the user who is dropping the database should either be a superuser or the OWNER of the database.

How to do it...

The following are the steps involved in creating and dropping a database using the createdb and dropdb utilities:

We will use help to list all the arguments for the createdb utility:

$ createdb --help
createdb creates a PostgreSQL database.

Usage:
  createdb [OPTION]... [DBNAME] [DESCRIPTION]

Options:
  -D, --tablespace=TABLESPACE  default tablespace for the database
  -e, --echo                   show the commands being sent to the server
  -E, --encoding=ENCODING      encoding for the database
  -l, --locale=LOCALE          locale settings for the database
      --lc-collate=LOCALE      LC_COLLATE setting for the database
      --lc-ctype=LOCALE        LC_CTYPE setting for the database
  -O, --owner=OWNER            database user to own the new database
  -T, --template=TEMPLATE      template database to copy
  -V, --version                output version information, then exit
  -?, --help                   show this help, then exit

Connection options:
  -h, --host=HOSTNAME          database server host or socket directory
  -p, --port=PORT              database server port
  -U, --username=USERNAME      user name to connect as
  -w, --no-password            never prompt for password
  -W, --password               force password prompt
  --maintenance-db=DBNAME      alternate maintenance database
By default, a database with the same name as the current user is created.Report bugs to <pgsql-bugs@postgresql.org>.

We will run the following command to create a database using createdb:

$ createdb -e pgtest -O user1
SELECT pg_catalog.set_config('search_path', '', false)
CREATE DATABASE pgtest OWNER user1;

We will then create a database using a template as follows:

$ createdb -e pgtest -O user1 -T percona
SELECT pg_catalog.set_config('search_path', '', false)
CREATE DATABASE pgtest OWNER user1 TEMPLATE pg11;

We will use help to list all the arguments for the dropdb utility:

$ dropdb --help
dropdb removes a PostgreSQL database.

Usage:
  dropdb [OPTION]... DBNAME

Options:
  -e, --echo                show the commands being sent to the server
  -i, --interactive         prompt before deleting anything
  -V, --version             output version information, then exit
  --if-exists               don't report error if database doesn't exist
  -?, --help                show this help, then exit

Connection options:
  -h, --host=HOSTNAME       database server host or socket directory
  -p, --port=PORT           database server port
  -U, --username=USERNAME   user name to connect as
  -w, --no-password         never prompt for password
  -W, --password            force password prompt
  --maintenance-db=DBNAME   alternate maintenance database
Report bugs to <pgsql-bugs@postgresql.org>.

We will now drop a database using dropdb:

$ dropdb -i pgtest
Database "pgtest" will be permanently removed.
Are you sure? (y/n) y

How it works

The best way to understand the options that can be passed to any utility is through --help. As seen in Step 1 and Step 4, we could list all the possible arguments we could pass to the createdb and dropdb utilities.

As seen in the options available with createdb in Step 1, we can use -e to print the commands sent to the server and -O to assign the ownership of the database to a user. Using -e does not stop it from running createdb but just prints the commands executed through createdb. As seen in Step 2, using -e and -O has created the database.

Another option available in PostgreSQL is creating a database using a template. Sometimes, we may create a template and wish to apply the same template to future databases. So, everything maintained inside the template database specified is copied to the database being created. As seen in step 3, we are creating a database named pgtest using a template database: percona.

When you wish to drop a database you have created, you could use the command seen in Step 5. It uses the dropdb utility to drop the database.

When you create a database or drop a database using any of the aforementioned utilities, you could simply use psql to list the databases you have created. We could either use the psql shortcut discussed in the previous recipe or query the catalog table: pg_database.

$ psql -c "\l"
 List of databases
 Name | Owner | Encoding | Collate | Ctype | Access privileges 
-----------+----------+----------+-------------+-------------+-----------------------
 pgtest | user1 | UTF8 | en_US.UTF-8 | en_US.UTF-8 | 
 postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
 | | | | | postgres=CTc/postgres
 template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
 | | | | | postgres=CTc/postgres
(4 rows)

$ psql -c "select oid, datname from pg_database"
 oid | datname 
-------+-----------
 13881 | postgres
 16385 | pgtest
 1 | template1
 13880 | template0
(4 rows)

There's more

We have seen how to create and drop a database using the createdb and dropdb utilities. The same can also be achieved using psql through CREATE and DROP commands. It is of course very simple to run a simple CREATE DATABASE or DROP DATABASE command. But, when you need to combine that with several parameters such as owner, encoding, tablespace, and connection limit, you may need to find the correct syntax. In order to do that, you could use \help as seen in the following example:

$ psql
psql (13.1)
Type "help" for help.

postgres=# \help CREATE DATABASE
Command: CREATE DATABASE
Description: create a new database
Syntax:
CREATE DATABASE name
    [ [ WITH ] [ OWNER [=] user_name ]
           [ TEMPLATE [=] template ]
           [ ENCODING [=] encoding ]
           [ LC_COLLATE [=] lc_collate ]
           [ LC_CTYPE [=] lc_ctype ]
           [ TABLESPACE [=] tablespace_name ]
           [ ALLOW_CONNECTIONS [=] allowconn ]
           [ CONNECTION LIMIT [=] connlimit ]
           [ IS_TEMPLATE [=] istemplate ] ]

postgres=# \help DROP DATABASE
Command: DROP DATABASE
Description: remove a database
Syntax:
DROP DATABASE [ IF EXISTS ] name

So, the command to create a database that is owned by user2 with a connection limit of 200 should be as follows:

postgres=# CREATE DATABASE mydb WITH OWNER user2 CONNECTION LIMIT 200;
CREATE DATABASE


postgres=# \l+ mydb
                                              List of databases
 Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description 
------+-------+----------+-------------+-------------+-------------------+---------+------------+-------------
 mydb | user2 | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 7809 kB | pg_default | 
(1 row)

Similarly, the command to drop a database, mydb, is as follows:

postgres=# DROP DATABASE mydb ;
DROP DATABASE


postgres=# \l+ mydb
                                        List of databases
 Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description 
------+-------+----------+---------+-------+-------------------+------+------------+-------------
(0 rows)

Locating a database and a table on the file system

So far, we have seen how to create or drop a database and how to list all the databases or schemas or tables. We discussed earlier that there is a base subdirectory inside the data directory of a PostgreSQL cluster that contains the databases and their objects. In this recipe, we are going to discuss some of the interesting details, such as the file system layout of a database and a table in PostgreSQL.

Getting ready

To identify a table or a database, we should have a running PostgreSQL cluster that we could connect using psql. Additionally, it requires access to the data directory on the database server (through the Postgres OS user) to walk through the locations and see the files in reality.

How to do it...

In the following steps, we will see how we could identify the directory specific to the database and how the table appears on the filesystem by locating it:

We will create a database for test purposes:

$ psql -c "CREATE DATABASE testing"
CREATE DATABASE

Get the oid value of the database from pg_database as seen in the following command:

$ psql -c "select oid, datname from pg_database"
  oid | datname 
-------+-----------
 13878 | postgres
 16384 | testing
     1 | template1
 13877 | template0
(4 rows)

Locate the database directory using the oid value from the previous step:

$ ls -ld $PGDATA/base/16384
drwx------. 2 postgres postgres 8192 Sep 1 07:14 /var/lib/pgsql/13/data/base/16384

Create a table in the newly created database:

$ psql -d testing -c "CREATE TABLE employee (id int)"
CREATE TABLE

Get the oid value of the table created in the previous step from pg_class as seen in the following command:

$ psql -d testing -c "select oid, relname from pg_class where relname = 'employee'"
  oid | relname 
-------+----------
 16385 | employee
(1 row)

Get the location of the table on the file system using the pg_relation_filepath function:

$ psql -d testing -c "select pg_relation_filepath('employee')"
 pg_relation_filepath 
----------------------
 base/16384/16385
(1 row)

How it works...

In order to understand the location of a database in the file system, we shall create a database as seen in Step 1. Now, we could use the catalog pg_database to get the oid of the database we created, as seen in Step 2.

As you can see in the log of Step 2, there is a unique identifier for each database. While there are three default databases that are automatically created, the fourth database we have created (testing) has got a unique oid (object identifier) as well. In the previous recipes, we discussed that the base directory contains all the databases of the cluster. So, a subdirectory with the same name as the oid of the database is created inside this base directory. This directory contains all the objects created inside the database.

What this means is that when you create a database in PostgreSQL, a subdirectory with the same name as the oid of the newly created database is created inside the base directory, as seen in Step 3.

Now, let's understand the file system layout of a table. We can see that in action by creating an example table in the testing database as seen in Step 4.

In order to identify the file system layout of the table, there is a function called pg_relation_filepath(). This function can be used to know the location of the table and the file that corresponds to this table. But before using this function, let's see if there is a unique identifier for this table using pg_class as seen in Step 5.

So, from the log, we can see that 16385 is the oid of the employee table we have just created. Now, let's run the function to know where this was table created, as seen in Step 6.

As we can see in the log of Step 6, a file with the name 16385 (the same as the oid of the employee table) is created inside the 16384 directory. The directory 16384 corresponds to the oid of the database we created. So, this indicates that a table in PostgreSQL is created as a file inside the database directory inside the base directory. This base directory is, of course, one of the subdirectories of the data directory.

Creating a schema in PostgreSQL

A schema in PostgreSQL is a logical entity used to group together a list of tables, serving a specific purpose. It is not mandatory to create schemas in PostgreSQL. There exists a default public schema that can be used to create all tables. However, creating schemas has several advantages. Let's look at a few:

Let's say there are multiple applications writing to the same database. Each application's logic may serve a specific purpose. If we have a separate schema for each application's logic, it makes the lives of administrators and developers easier. This is because, if there is a need to perform maintenance on the tables related to a specific application, an admin can distinguish it using the schema.
In some applications, when a new user is created, the user is allocated a new set of objects that share the same name as the objects of another user in the same database. It is not possible to have two objects with the same name in a single schema. But, when we have two schemas, an object with the same name can exist in different schemas, for example:
Table employee in user1 schema ⇒ user1.employee
Table employee in user2 schema ⇒ user2.employee
User management can be simplified when we use schemas in PostgreSQL. If there are multiple application modules connecting to the same database, each module may need to access a certain set of tables. And each developer may need access to a specific set of objects related to that application module but not all. If schemas are used, users and applications can be granted read or read-write access to specific schemas through roles in a more simplified way.

In this recipe, we shall discuss the purpose of a schema and how it can be created.

Getting ready

To create a schema, we must have a PostgreSQL cluster that is running and can be connected using psql. Additionally, it requires either a user with the superuser role or ownership of the database to create a schema in the database.

When you create a schema in PostgreSQL, it does not create a user with the same name as the schema name as it does in Oracle. Users need to be explicitly created.

How to do it...

The following are the steps involved in creating a schema in a PostgreSQL database:

Connect to the database:

$ psql -d percona
psql (13.1)
Type "help" for help.

percona=#

Create the schema using the CREATE SCHEMA command:

percona=# CREATE SCHEMA foo;
CREATE SCHEMA

Use IF NOT EXISTS for scripting purposes:

percona=# CREATE SCHEMA foo;
ERROR:  schema "foo" already exists
percona=# CREATE SCHEMA IF NOT EXISTS foo;
NOTICE:  schema "foo" already exists, skipping
CREATE SCHEMA

How it works...

In order to create a schema, you should first choose the database where you need to create the schema and connect to the database using psql, as seen in Step 1. Once you have connected to the database, you could simply use the command seen in Step 2 to create the schema.

Sometimes, you may wish to automate schema creation through scripting and avoid errors if the schema already exists. For that purpose, we could use the option seen in Step 3, to avoid printing an error when the schema already exists and create it only if it does not exist.

When you create a schema in PostgreSQL, the user who issued the CREATE SCHEMA command gets the ownership by default. This can be modified by using AUTHORIZATION. The following is the log that shows that the ownership is automatically assigned to current_user:

percona=# select current_user;
 current_user 
--------------
 postgres
(1 row)

percona=# CREATE SCHEMA foo;
CREATE SCHEMA
percona=# 
percona=# SELECT n.nspname AS "Name",
  pg_catalog.pg_get_userbyid(n.nspowner) AS "Owner"
FROM pg_catalog.pg_namespace n 
WHERE n.nspname = 'foo';
 Name |  Owner   
------+----------
 foo  | postgres
(1 row)

In order to assign ownership of a schema to another user, either execute CREATE SCHEMA as that user or use AUTHORIZATION. You would need to grant the CREATE privilege to a user to execute CREATE SCHEMA. In the following log, user1 was already granted the CREATE privilege so CREATE SCHEMA succeeded with no errors:

$ psql -d percona -U user1
psql (13.1)
Type "help" for help.

percona=> select current_user;
 current_user 
--------------
 user1
(1 row)

percona=> CREATE SCHEMA foo;
CREATE SCHEMA

percona=> SELECT n.nspname AS "Name",
  pg_catalog.pg_get_userbyid(n.nspowner) AS "Owner"
FROM pg_catalog.pg_namespace n 
WHERE n.nspname = 'foo';
 Name | Owner 
------+-------
 foo  | user1
(1 row)
percona=>

There's more...

When you are connecting as a superuser (postgres), you could grant the ownership to any user as you can see in the following log:

percona=# select current_user;
 current_user 
--------------
 postgres
(1 row)
percona=# CREATE SCHEMA foo AUTHORIZATION user1;
CREATE SCHEMA
percona=# 
percona=# SELECT n.nspname AS "Name",
  pg_catalog.pg_get_userbyid(n.nspowner) AS "Owner"
FROM pg_catalog.pg_namespace n 
WHERE n.nspname = 'foo';
 Name | Owner 
------+-------
 foo  | user1
(1 row)

When you use AUTHORIZATION to give ownership of a schema to a non-superuser (user2) as a non-superuser (user1), the user executing CREATE SCHEMA (user1 here) should be a member of the role (user2) who should own the schema. In the following log, you can see an error when user1 is giving authorization to user2 without being a member of user2:

percona=> CREATE SCHEMA foo AUTHORIZATION user2;
ERROR:  must be member of role "user2"
postgres=# GRANT user2 to user1;
GRANT ROLE

percona=> CREATE SCHEMA foo AUTHORIZATION user2;
CREATE SCHEMA

Checking table and index sizes in PostgreSQL

In this recipe, we shall see some of the best ways to check the size of a table and index in PostgreSQL.

Getting ready

Creating a table in PostgreSQL requires you to have appropriate privileges. A newly created user can create a table in the public schema of any database. But, when you need to create a table inside another schema, it requires you to have sufficient privileges to do so. Similarly, it requires you to have sufficient privileges to read data from a table or perform writes to that table. However, it does not require you to have any privileges to see the size of a table in any schema.

How to do it...

The following are the steps involved in finding the size of a table and an index:

We will use \dt+ to get the table size:

percona=# \dt+ foo.dept 
                    List of relations
 Schema | Name | Type  | Owner | Size | Description 
--------+------+-------+----------+---------+-------------
 foo    | dept | table | postgres | 3568 kB | 
(1 row)

  
--- To get size of a table or a set of tables matching a pattern, the following can be used. 

percona=# \dt+ public.*bench*
                          List of relations
 Schema | Name | Type | Owner | Size | Description 
--------+------------------+-------+----------+---------+-------------
 public | pgbench_accounts | table | postgres | 1281 MB | 
 public | pgbench_branches | table | postgres | 40 kB | 
 public | pgbench_history | table | postgres | 0 bytes | 
 public | pgbench_tellers | table | postgres | 80 kB | 
(4 rows)

We will use \di+ to get the index size:

 percona=# \di+ foo.dept_pkey
                           List of relations
 Schema |   Name | Type  | Owner | Table |  Size | Description 
--------+-----------+-------+----------+-------+---------+-------------
 foo    | dept_pkey | index | postgres | dept  | 2208 kB | 
(1 row)

We will use pg_relation_size to get the table and index size:

 percona=# select pg_relation_size('foo.dept');
 pg_relation_size 
------------------
 3629056
(1 row)

How it works...

Database objects such as tables, indexes, views, materialized views, and so on may also be called relations in PostgreSQL. The three relations that can grow in size are tables, indexes, and materialized views. There are multiple ways to find the size of a table or an index in PostgreSQL. While one of them is using psql shortcuts, the other method is through a function called pg_relation_size().

As seen in Step 1, we can pass the fully qualified table name (schemaname.tablename) to \dt+ after connecting to the appropriate database using psql. And in order to find the size of an index, we can pass the fully qualified index name (that is, chemaname.indexname) to \di+ after connecting to the appropriate database using psql, as seen in Step 2.

We can also find the size of these relations using pg_relation_size() easily, as seen in Step 3. We need to make sure that we pass a fully qualified relation name with the schema prefix to the pg_relation_size() function.

There's more...

When we used the function: pg_relation_size() earlier, we noticed that the size of a relation is displayed in bytes. This is not easily readable. For that purpose, we have the function pg_size_pretty(). This function converts the bytes into the nearest MB or GB or KB but not bytes, always. The following example should be helpful to understand this in reality:

postgres=# SELECT schemaname, relname, pg_size_pretty(pg_relation_size(schemaname||'.'||relname)) as pretty_size FROM pg_stat_user_tables where schemaname = 'foo' and relname IN ('employee','sales','bar');
 schemaname | relname | pretty_size 
------------+----------+-------------
 foo | employee | 360 kB
 foo | bar | 640 MB
 foo | sales | 13 GB
(3 rows)

Creating tablespaces

A tablespace in PostgreSQL can be used to distribute database objects such as tables and indexes to different disks/locations. This is especially helpful in distributing the IO across multiple disks and avoiding IO saturation on a single disk. In this recipe, we shall see the steps involved in creating tablespaces in PostgreSQL.

Getting ready

A tablespace directory needs to be created on the file system before creating them in the database. We should have access to the operating system as a root user or a user with sudo access to create directories on the mount points that are owned by the root user.

When you create a tablespace in the master-slave replication cluster, which is using streaming replication, you must make sure that the tablespaces also exist on the standby server. Similarly, when you restore a backup from a PostgreSQL cluster that has got one or more tablespaces, you must make sure to consider creating the respective tablespace directories before performing the restore of the backup.

How to do it...

The following steps can be used to create a tablespace in PostgreSQL:

Create a directory as shown:

$ sudo mkdir -p /newtablespace
$ sudo chown postgres:postgres /newtablespace

Now create a tablespace using the newly created directory by running the following command:

$ psql -c "CREATE TABLESPACE newtblspc LOCATION '/newtablespace'"

Check the pg_tblspc directory to see a symlink to the new tablespace:

$ ls -l $PGDATA/pg_tblspc
 total 0
 lrwxrwxrwx. 1 postgres postgres 14 Nov 3 00:24 24611 -> /newtablespace

How it works

To create a tablespace, we must specify a location in which the tablespace must be created, as seen in step 1. We will see the benefits of having a separate tablespace when it is created on a different disk other than the disk being used by the data directory. Now, to create a tablespace using the newly created directory, we could simply use the command as seen in step 2.

When we create a tablespace, we see a new entry in the pg_tblspc directory that has a symlink to the new tablespace location as seen in the output of step 3. There will be many such entries when we create more tablespaces.

Once you have created tablespaces, you could simply validate all the tablespaces and their location using the shortcut \db as seen in the following log:

$ psql
 psql (13.1)
 Type "help" for help.

postgres=# \db
 List of tablespaces
 Name | Owner | Location
 ------------+----------+----------------
 newtblspc | postgres | /newtablespace
 pg_default | postgres |
 pg_global | postgres |
 (3 rows)

There's more...

In order to create a table in the new tablespace, we may just append the appropriate tablespace name to the CREATE TABLE command:

postgres=# create table employee (id int) TABLESPACE newtblspc;
 CREATE TABLE

If you create a table inside the new tablespace, here is how the relation path appears. In the following log, it shows how the table is pointing to the appropriate tablespace:

postgres=# select pg_relation_filepath('employee');
 pg_relation_filepath
 ---------------------------------------------
 pg_tblspc/24611/PG_13_201909212/14187/24612
 (1 row)

And now, if you describe the table, you should be able to see the tablespace in which the table got created:

postgres=# \d employee
 Table "public.employee"
 Column | Type | Collation | Nullable | Default
 --------+---------+-----------+----------+---------
 id | integer | | |
 Tablespace: "newtblspc"

With the preceding output, it is clear that the table is created inside the new tablespace.

Moving tables to a different tablespace

After a few months or years, you see the data in your database growing. And when the number of transactions increases along with the volume of data, you will wish to distribute objects, especially tables and indexes, across multiple tablespaces to scatter the IOPS. In this recipe, we shall see the steps involved in moving existing tables and indexes to different tablespaces.

Getting ready

To move tables and indexes to a different tablespace, we could use the ALTER TABLE syntax. We should be a superuser or an owner of the schema to run ALTER TABLE.

Running the ALTER TABLE command could cause downtime for the application as it acquires an exclusive lock on the table while moving it to a different tablespace. So the downtime needs to be planned appropriately before proceeding further. Additionally, we could use extensions such as pg_repack, which could be used to move tables and indexes to different tablespaces online. pg_repack will be discussed in future chapters.

How to do it...

The following steps can be performed to move a table to a different tablespace:

To move a table to another tablespace, the syntax looks like the following:

ALTER TABLE percona.foo SET TABLESPACE newtblspc;

To move an index to a new tablespace, the syntax looks like the following:

ALTER INDEX percona.foo_id_idx SET TABLESPACE newtblspc;

How it works

In order to move a table from one tablespace to another, we could simply use the ALTER TABLE command as seen in the following syntax:

ALTER TABLE <schemaname.tablename> SET TABLESPACE <tablespace_name>;

As an example, we could see the command that can be used to move a table, percona.foo, to a tablespace named newtblspc in step 1.

Similarly, in order to move an index to a tablespace, the syntax appears like the following:

ALTER INDEX <schemaname.indexname> SET TABLESPACE <tablespace_name>;

An example command to move the index percona.foo_id_idx to tablespace newtblspc can be seen in step 2.

Creating a user in PostgreSQL

In order to connect to a PostgreSQL database, we need to have a username. A PostgreSQL user is a role that has the CONNECT privilege. Both CREATE USER and CREATE ROLE work well to create a PostgreSQL user. The only difference between the two is that the LOGIN role is not assigned when we use CREATE ROLE to create a user. In this recipe, we shall see how a user can be created in PostgreSQL.

Getting ready

It requires a superuser or a CREATEROLE privilege for the database user to create a user or a role. So, we must use a user who has either of these privileges to create a user.

How to do it...

A user with a LOGIN role can be created using any of the following three methods:

Method 1: We can create a user using the CREATE USER command:

CREATE USER percuser WITH ENCRYPTED PASSWORD 'secret';

Method 2: We can create a user using the CREATE ROLE .. WITH LOGIN command:

CREATE ROLE percuser WITH LOGIN ENCRYPTED PASSWORD 'secret';

Method 3: We can create a user using the CREATE ROLE command and then assign the LOGIN privilege to that user:

CREATE ROLE percuser;
ALTER ROLE percuser WITH LOGIN ENCRYPTED PASSWORD 'secret';

The three aforementioned methods demonstrate the three different ways in which a user, percuser, can be created.

How it works

When you use CREATE USER with any of these commands, PostgreSQL automatically translates them internally with the following:

CREATE ROLE percuser
 WITH NOSUPERUSER INHERIT NOCREATEROLE NOCREATEDB LOGIN NOREPLICATION NOBYPASSRLS
 ENCRYPTED PASSWORD 'secret';

In order to validate whether the user can log in (has the CONNECT privilege) or not, we may use the following query and substitute the username appropriately:

postgres=# select rolcanlogin from pg_roles where rolname = 'percuser';
 rolcanlogin
 -------------
 t
 (1 row)

There's more...

In order to list the users and roles created in a PostgreSQL server, we could either query the view (pg_roles) or use the shortcut "\du":

We can use the shortcut "\du" to get the list of users:

$ psql -c "\du"
 List of roles
 Role name | Attributes | Member of
 ------------------+------------------------------------------------------------+-----------
 app_user | | {}
 dev_user | | {}
 percuser | | {}
 postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 read_only_scott | Cannot login | {}
 read_write_scott | Cannot login | {}

We can also use the pg_roles query to get the list of users:


 $ psql -c "SELECT rolname, rolcanlogin FROM pg_roles where rolname NOT LIKE 'pg_%'"
 rolname | rolcanlogin
 ------------------+-------------
 postgres | t
 percuser | t
 read_write_scott | f
 read_only_scott | f
 app_user | t
 dev_user | t
 (6 rows)

Thus, we have learned how to create users with various privileges in PostgreSQL.

Dropping a user in PostgreSQL

We may need to drop the users who no longer need access to PostgreSQL. There will also be a great challenge when a user who is being dropped owns one or more objects. In this case, we may have to re-assign the ownership to another user before dropping the user without dropping the objects it owns. In this recipe, we shall see how a user can be dropped safely and the best practices you can follow to avoid dropping the objects a user owns.

Getting ready

To drop a user, a simple DROP USER or DROP ROLE command is sufficient. However, this only works without errors when there are no objects owned by the user that is being dropped. Otherwise, an error like the following appears in such cases:

postgres=# DROP USER percuser;
 ERROR: role "percuser" cannot be dropped because some objects depend on it
 DETAIL: privileges for database percona
 2 objects in database pmm
 2 objects in database percona

How to do it ...

The following steps need to be followed to complete the recipe:

When a user that does not own any objects has to be dropped, it could be done using a simple command, as follows:

$ psql -c "DROP USER percuser"

Re-assign the ownership of objects that are owned by the user that needs to be dropped:

$ psql -U postgres -d percona -c "REASSIGN OWNED by percuser TO pmmuser"

Revoke the privileges from the user that is being dropped:

$ psql
 psql (13.1)
 Type "help" for help.
 postgres=# REVOKE ALL PRIVILEGES ON DATABASE percona FROM percuser;
 REVOKE
 postgres=# REVOKE ALL PRIVILEGES ON DATABASE pmm FROM percuser;
 REVOKE

Drop the user after the revocation is successful:

postgres=# DROP USER percuser ;
DROP ROLE

How it works ...

When the user who needs to be dropped does not own any objects, the command to drop a user as seen in step 1 would succeed without any errors. However, if the user is an OWNER of one or more objects of one or more databases, then the ownership of the objects owned by the user being dropped must be reassigned to another user. This can be done using REASSIGN OWNED as a superuser.

Dropping a user that owns one or more objects of one or more databases can be done using three simple steps. The first step is to reassign the ownership of the objects owned by the user to another user. This can be done using a simple command as seen in step 2.

If there exists another database that has some objects owned by the user being dropped, the ownership of the objects must be reassigned separately for the objects in that database as well:

$ psql -U postgres -d pmm -c "REASSIGN OWNED by percuser TO pmmuser"

Notice the difference in the two commands – the one we saw in step 2 and the preceding command. The first one connects to the percona database and the second command connects to the pmm database and reassigns ownership to pmmuser.

Once the ownership of the objects has been reassigned using the command mentioned in step 2, all the privileges owned by that user must be revoked using the command seen in step 3.

After revoking the privileges from the user being dropped, we can safely use the DROP USER command to drop the user as seen in step 4.

Assigning and revoking a privilege to/from a user or a role

Once a user is created, the next task is to grant privileges or sometimes to revoke granted privileges from the user. PostgreSQL supports several privileges that are similar to other relational databases. The following is a list of available privileges that can be granted to a user or revoked from a user/role:

SELECT
INSERT
UPDATE
DELETE
TRUNCATE
REFERENCES
TRIGGER
CREATE
CONNECT
TEMPORARY
EXECUTE
USAGE

In this recipe, we shall see how we can assign and revoke a privilege to/from a user.

Getting ready

To grant or revoke privileges from a user, GRANT and REVOKE commands are used. It will be wise to use a database user that has a superuser role or sometimes the owner of the schema and objects to perform GRANT or REVOKE.

How to do it

The following steps need to be followed to understand the recipe:

To grant a select privilege on a table, employee, to a user, percuser, the following GRANT command could be used:

GRANT SELECT ON employee TO percuser;

Now we revoke SELECT from the user:

REVOKE SELECT ON employee FROM percuser;

GRANT all the privileges possible on the employee table to percuser:

GRANT ALL ON employee TO percuser;

REVOKE all the privileges on employee from percuser:

REVOKE ALL ON employee FROM percuser;

GRANT all privileges on a database to a user:

GRANT ALL ON DATABASE percona TO percuser ;

REVOKE all privileges from a user:

REVOKE ALL ON DATABASE percona FROM percuser ;

How it works

In order to assign privileges, a GRANT command must be used. And to revoke the assigned privilege, a REVOKE command must be used. For example, to assign a SELECT privilege on the employee table to a user, percuser, the command seen in step 1 can be used. And to revoke the SELECT privilege on the employee table from percuser, the command seen in step 2 can be used.

When a privilege is granted to a user or a role, it takes effect immediately without the need for SIGHUP or a reload. At the same time, a user can also be granted all the privileges that can be assigned to an object depending on the object type.

The command seen in step 3 can be used to grant all the privileges possible on the employee table to percuser. And to revoke all the privileges, the command seen in step 4 can be used. We could similarly allocate all privileges on a database to a user as seen in step 5. And to revoke the privileges, the command seen in step 6 can be used.

Creating a group role for role-based segregation

It is a very challenging task for admins when there are several tens or hundreds of users in a database that need to be assigned privileges to access or modify database objects. It becomes a time-consuming task to individually manage each user and assign SELECT or INSERT privileges to hundreds of objects. For this reason, it is always recommended to provide access privileges to database users using GROUP ROLES. In this recipe, we shall see how GROUP ROLES can be used for role-based segregation.

Getting ready

A role in Postgres can INHERIT another role. This means one role can be granted the privileges of another role. This can be achieved using the GRANT or INHERIT keyword. So, there can be a read-only role and a read-write for each schema or for a set of objects belonging to a specific schema or an application's logic/module. So, if a user needs to access the objects of a specific application module, just the role belonging to that application module can be granted to the user. This helps in achieving better user management.

How to do it

The following steps need to be followed to complete the recipe:

Create the read-only and read-write roles for each schema respectively:

CREATE ROLE scott_readonly;
CREATE ROLE scott_readwrite;
CREATE ROLE tiger_readonly;
CREATE ROLE tiger_readwrite;

Grant the SELECT access in the schemas to their associated roles:

GRANT USAGE, SELECT ON ALL TABLES IN SCHEMA scott TO scott_readonly;
GRANT USAGE, SELECT ON ALL TABLES IN SCHEMA tiger TO tiger_readonly;

Grant usage and write access in the schemas to their appropriate read-write roles:

GRANT USAGE, SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA tiger TO tiger_readwrite;
GRANT USAGE, SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA scott TO scott_readwrite;

Assign read-write roles to application users and read-only roles to individual users:

GRANT scott_readwrite to appuser1;
GRANT tiger_readwrite to appuser2;

GRANT scott_readonly to devuser1;
GRANT tiger_readonly to devuser2;

How it works

In order to understand how to implement the group roles in a better way, consider an example where there is a company that has one PostgreSQL database (salesdb) with two schemas – scott and tiger.

Now, the following are the requirements:

Create 100 individual user accounts and 5 application users.
Grant read access on the 2 schemas to 100 individual user accounts.
Grant read and write access on 2 schemas to 5 application users.

There are multiple ways to achieve this. One way is to grant access to each of these objects to all the user accounts individually. But isn't that very time-consuming? The other way is to use group roles.

In order to use group roles, we need to create two roles for each schema, as seen in step 1 – one for read-only and the other for read-write for both scott and tiger schemas:

scott_read_only: The READ-ONLY role to perform READS on the schema scott.
scott_read_write: The READ-WRITE role to perform WRITES on the schema scott.

tiger_read_only: The READ-ONLY role to perform READS on the schema tiger.
tiger_read_write: The READ-WRITE role to perform WRITES on the schema tiger.

And then the reads and writes on the two schemas can be granted to their associated roles as seen in step 2 and step 3. And those roles can be granted to the individual users and the application users as seen in step 4.

By using the preceding approach, a role can be directly assigned to a new user instead of granting appropriate privileges on each object explicitly. This helps in achieving proper user management through role-based segregation.

MVCC implementation and VACUUM in PostgreSQL

MVCC implementation in PostgreSQL is unique when compared to Oracle and MySQL-like relational databases. MVCC stands for Multi-Version Concurrency Control. As the full form speaks for itself, MVCC is needed to support consistency while running transactions so that readers and writers do not block each other.

To understand it better, consider a hypothetical situation where transaction A started at 9:00 a.m. to get a count of all the records in a table: foo.bar (with 10,000,020 records). As it is a very huge table, let's say it is said to be completed in 20 minutes. Another transaction, B, started at 9:10 a.m. to delete 20 records from the same table. When transaction A, which started at 9:00 a.m., is completed at 9:20 a.m., it still should be able to see the same records as it did at 9:00 a.m., that is 10,000,020 records, without considering transaction B, which deleted some records at 9:10 a.m. Though the behavior always depends on the isolation levels, it is still able to provide a consistent view of the data as to how it was when the query actually ran. How does it work? What is happening internally? We will discuss these things in this recipe.

Throughout this book, you shall see the words records or tuples (of a table). A record in PostgreSQL is mostly referred to as a tuple. Also, PostgreSQL may be referred to as Postgres or PG in many places. They are one and the same.

Getting ready

Oracle and MySQL-like databases have separate UNDO storage that stores the past images required for consistency. If an existing record of a table is modified (updated or deleted), the past image is copied to a separate location. This way, if there is an existing transaction that started before the record got modified, it can still access the record as it was before it got modified. However, this UNDO is maintained in a separate location, not within the same table.

In PostgreSQL, UNDO is maintained in its own table. What this means is that the tuple before modification and the modified tuple are both stored in the same table.

How to do it...

In the following steps, we shall understand how PostgreSQL implements MVCC by explaining some of the system columns in detail. We shall also consider a simple example where we create a table with two columns, insert some records, and see the transaction IDs assigned to these records. We shall then query system columns such as xmin and xmax and understand how multiple versions of rows are maintained within the same table. This exercise will not only help you understand MVCC but will also show you some of the common queries that are useful in your daily admin life:

Create a schema and a table with two columns and insert some records into it:

postgres=# CREATE SCHEMA foo;
CREATE SCHEMA
postgres=# CREATE TABLE foo.bar (id int, name varchar(5));
CREATE TABLE
postgres=# INSERT INTO foo.bar VALUES (generate_series(1,5),'avi');
INSERT 0 5

Query the pg_attribute table to see the system columns that got added to the table along with the two columns id and name:

postgres=# SELECT attname, format_type (atttypid,atttypmod)
FROM pg_attribute
WHERE attrelid = 'foo.bar'::regclass::oid
ORDER BY attnum;

 attname  | format_type      
----------+----------------------
 tableoid | oid
 cmax     | cid
 xmax     | xid
 cmin     | cid
 xmin     | xid
 ctid     | tid
 id       | integer
 name     | character varying(5)
(8 rows)

We shall then select all the columns from the table using the select * from table command and understand that we don't see any data related to the system column:

postgres=# SELECT * FROM foo.bar LIMIT 1;
 id | name 
----+------
  1 | avi
(1 row)

Now, to select the values of a system column exclusively, we shall include the system column name in the select command and see what it stores:


postgres=# select xmin,* from foo.bar limit 1;
 xmin | id | name 
-------+----+------
 11705 | 1 | avi
(1 row)

Let's query the pg_class table to see the oid of the table created in step 1:

postgres=# SELECT oid, relname FROM pg_class WHERE relname = 'bar';
  oid  | relname 
-------+---------
 31239 | bar
(1 row)

If we have two tables with the same name, bar, but in different schemas, they do not share the same oid, as seen in the following example. In this example, we shall create another table in a different schema than the one created in step 1 and see that the oid is different for both:

postgres=# CREATE TABLE public.bar (id int, name varchar(5));
CREATE TABLE

postgres=# SELECT oid, relname FROM pg_class WHERE relname = 'bar' and relkind = 't';
  oid  | relname 
-------+---------
 31242 | bar
 31239 | bar
(2 rows)

To properly identify the table that belongs to a specific schema, we could join pg_namespace with pg_class as seen in the following log:

postgres=# SELECT pc.oid, pn.nspname, pc.relname
           FROM pg_class pc
           JOIN pg_namespace pn ON pc.relnamespace = pn.oid
                WHERE pn.nspname = 'foo' 
                AND pc.relname = 'bar';
  oid  | nspname | relname 
-------+---------+---------
 31239 | foo     | bar
(1 row)

We could also use regclass to identify the oid of a fully qualified table. A fully qualified table is a table specified along with its schemaname (schemaname.tablename):

postgres=# select 'foo.bar'::regclass::oid;
  oid  
-------
 31239
(1 row)

In this step, we will see how the system column tableoid can be seen from the table for each record and understand that it is the same as the oid of the table:

postgres=# select tableoid, id, name from foo.bar limit 1;
 tableoid | id | name 
----------+----+------
    31239 |  1 | avi
(1 row)

Every transaction in PostgreSQL has a unique transaction ID. In this step, we shall see how a transaction ID remains the same within a transaction block and changes for a new transaction:

postgres=# BEGIN;
BEGIN
postgres=# select txid_current();
 txid_current 
--------------
        11902
(1 row)

postgres=# select txid_current();
 txid_current 
--------------
        11902
(1 row)
postgres=# END;
COMMIT

postgres=# select txid_current();
 txid_current 
--------------
        11903
(1 row)

By querying xmin explicitly, we can see the transaction ID that inserted the records by finding the xmin value of each record. Notice the xmin values of all the records in the following log:

postgres=# select xmin,* from foo.bar; 
 xmin  | id | name 
-------+----+------
 11705 |  1 | avi
 11705 |  2 | avi
 11705 |  3 | avi
 11705 |  4 | avi
 11705 |  5 | avi
 11905 |  6 | avi
(6 rows)

We could also find the xmax of each record by explicitly selecting it. If xmax is set to 0, it was never deleted and is visible:

postgres=# select xmin, xmax, * from foo.bar ;
 xmin  | xmax | id | name 
-------+------+----+------
 11705 |    0 | 1 | avi
 11705 |    0 | 2 | avi
 11705 |    0 | 3 | avi
 11705 |    0 | 4 | avi
 11705 |    0 | 5 | avi
 11905 |    0 | 6 | avi
 11907 |    0 | 7 | avi
(7 rows)

If we perform a delete operation to delete a record, subsequent select queries cannot see the deleted record anymore:

postgres=# BEGIN;
BEGIN
postgres=# DELETE FROM foo.bar WHERE id = 7;
DELETE 1
postgres=# COMMIT;
COMMIT
postgres=# select xmin, xmax, * from foo.bar ;
 xmin  | xmax | id | name 
-------+------+----+------
 11705 |    0 | 1 | avi
 11705 |    0 | 2 | avi
 11705 |    0 | 3 | avi
 11705 |    0 | 4 | avi
 11705 |    0 | 5 | avi
 11905 |    0 | 6 | avi
(6 rows)

Now, let's use two terminals in parallel. In one terminal, we shall delete a record and then observe the xmin and xmax values of the record being deleted from another terminal, before committing delete:

Terminal 1: Running delete but not committing it. Note the transaction ID that performed delete:

postgres=# BEGIN;
BEGIN
postgres=# select txid_current();
 txid_current 
--------------
        11911
(1 row)

postgres=# DELETE FROM foo.bar WHERE id = 6;
DELETE 1

Terminal 2: We can see the xmax value changed to the transaction ID that executed delete in terminal 1:

postgres=# select xmin, xmax, * from foo.bar ;
 xmin  | xmax  | id | name 
-------+-------+----+------
 11705 |     0 | 1 | avi
 11705 |     0 | 2 | avi
 11705 |     0 | 3 | avi
 11705 |     0 | 4 | avi
 11705 |     0 | 5 | avi
 11905 | 11911 | 6 | avi
(6 rows)

Roll back the delete and now see the xmax value:

Terminal 1: Let's issue rollback instead of commit so that the record is not deleted:

postgres=# BEGIN;
BEGIN
postgres=# select txid_current();
 txid_current 
--------------
        11911
(1 row)

postgres=# DELETE FROM foo.bar WHERE id = 6;
DELETE 1
postgres=# ROLLBACK;
ROLLBACK

Terminal 2: We can see that the xmax still remains the same but internally the hint bits xact_rolled_backed will be set to true:

$ psql -d postgres -c "select xmin, xmax, id, name from foo.bar"
 xmin  | xmax  | id | name 
-------+-------+----+------
 11705 |     0 | 1 | avi
 11705 |     0 | 2 | avi
 11705 |     0 | 3 | avi
 11705 |     0 | 4 | avi
 11705 |     0 | 5 | avi
 11905 | 11911 |  6 | avi
(6 rows)

We could query the location of each tuple by querying the system column, ctid:

postgres=# select xmin, xmax, ctid, * from foo.bar ;
 xmin  | xmax  | ctid | id | name 
-------+-------+-------+----+------
 11705 |     0 | (0,1) | 1 | avi
 11705 |     0 | (0,2) | 2 | avi
 11705 |     0 | (0,3) | 3 | avi
 11705 |     0 | (0,4) | 4 | avi
 11705 |     0 | (0,5) | 5 | avi
 11905 | 11911 | (0,6) |  6 | avi
(6 rows)

How it works...

In order to understand how MVCC works in PostgreSQL, it is important to understand some of the system columns of a table in PostgreSQL. The preceding example contains a demonstration of the hidden columns of a table in PostgreSQL along with the changes to their values when their corresponding records are modified.

If you observe Step 1, it is visible that a table with the name foo.bar has been created with just two columns. However, when you see the output in Step 2, it is interesting to see that it is not just two columns but there are some additional columns that are automatically created by PostgreSQL.

Well, through the output, it is clear that there are six additional columns to what is assumed to be created when we create a table using the CREATE TABLE syntax. To understand how these columns make a significant difference to the way MVCC is implemented in PostgreSQL, let's learn about these system columns in detail.

Though these columns are considered to be hidden, it doesn't mean that the values in the columns are a mystery to an admin. The reason why these columns are considered hidden columns is they are excluded from the output of select * from table, as seen in the output of Step 3.

In order to see what values are stored in these hidden columns, these columns need to be exclusively used in the SELECT statement as seen in Step 4. In this example, we see the difference between selecting all the columns of a table versus selecting a system column exclusively along with the actual columns.

tableoid

Now, before learning about tableoid, it is important to understand what an OID is. An OID in PostgreSQL stands for an Object Identifier. When a table is created in PostgreSQL, a new record with the table name and the schema name is inserted into the system tables – pg_class and pg_namespace. OIDs are used by PostgreSQL internally as a primary key for such system tables. In order to find the oid of the table foo.bar that was created earlier, the easiest way is to query the pg_class system table as seen in Step 5.

But, what if there is more than one table with the same name but in two different schemas? In PostgreSQL, it is possible to have more than one schema in a single database. For example, if we observe the output in Step 6, it is visible that a table with the same name as the table created in Step 1 was created in a different schema as well.

Thus, in order to find the oid of the table that corresponds to the appropriate schema, pg_class can be joined with the system table pg_namespace (which contains the schema name and the oid of the schema). For every relation in pg_class, the oid of its schema is also inserted. To see that in action, the log in Step 7 contains simple SQL to identify the oid of a specific table that belongs to a specific schema.

There is another easy way to find the OID of a table, using regclass. Substitute foo.bar with the schema name and table name as seen in Step 8.

Now to understand tableoid in a simple way, it is nothing but a column that contains the oid of the table, which is the same as the oid visible in the pg_class table. See Step 9, which illustrates how we can select the tableoid along with the other columns of a table.

xmin

xmin is one of the important columns that a PostgreSQL admin should be fully aware of. An admin's day-to-day activity totally depends on understanding xmin very well. To understand xmin better, let's learn about transaction IDs in PostgreSQL. We are not going to discuss problems with transaction IDs in this chapter; this is just an introduction. For now, let's remember that a transaction ID is a unique identifier assigned to a transaction.

A transaction ID in PostgreSQL is a 32-bit unsigned integer. It is cyclic, which means that it starts from 0 and goes up to 4.2 billion (4,294,967,295) and then starts from 0 again. The function txid_current() shows the ID of the current transaction. If we observe the output in Step 10 carefully, we see that the transaction ID stayed the same within the entire transaction (between BEGIN and END) but it changed incrementally for another new transaction.

As we've understood the transaction ID now, xmin is nothing but the transaction ID that inserted that tuple. For example, in the output of Step 11, we can see that the first five records were inserted by a transaction with the ID 11705 and the last record was inserted by a transaction with the ID 11905.

This difference in xmin is essential in determining what tuples are visible to a transaction. For example, an SQL statement in a transaction that started before 11905 may not be able to see the records inserted by its future transactions.

xmax

The xmax value makes a significant difference when there are tuples that are being deleted or updated. Before we start to learn about xmax, see the log in Step 12, which shows the xmax value of the records in the foo.bar table.

In the log, we see that the value of xmax is 0. The value of xmax is 0 when it is a row that was never deleted or attempted for delete. There are two scenarios that could happen when you consider deleting a record:

A delete command was issued by a transaction and it was committed.
A delete command was issued by a transaction but it hasn't been committed yet, after it.

In the first scenario, it is quite understandable that when a delete was issued and committed, the record was no more visible, as seen in Step 13. So, there is no point in discussing the xmax value for that record.

But, what about the second scenario, where the delete has not been committed yet? To demonstrate that, I have issued a delete in one terminal and looked at the xmax value in another terminal, as seen in Step 14. If you look at the terminal 2 log carefully, the xmax value has been updated with the transaction ID that issued the delete. Please note that the xmax value remains the same as the transaction ID that issued the delete when a ROLLBACK is issued. And when the delete is committed, as discussed earlier, the record is no longer visible to the future selects.

As seen in Step 15, if I issue a ROLLBACK instead of COMMIT, the xmax value remains the same as the transaction ID that issued a delete before the rollback.

As we understood xmin and xmax now, when a transaction runs SELECT on a table, the records that are visible to the transaction are the tuples with (xmin <= txid_current()) and (xmax = 0 OR txid_current() < xmax):

select * from foo.bar where id = 2 ;

The preceding SQL issued by a transaction internally uses the following logic:

select * from foo.bar where id = 2 (and xmin <= txid_current() AND (xmax = 0 OR txid_current() < xmax));

ctid

ctid is the field that denotes the location of a tuple in a Postgres table. It is unique for each tuple. It contains the page/block number along with the tuple index within that page for the tuple. For example, the log in Step 16 shows that all the tuples are stored in page 0 and it also shows their locations within the page.

pageinspect

We are going to discuss extensions in PostgreSQL in future chapters. For now, consider them as a piece of external code that can be attached to existing Postgres code to achieve a specific functionality. pageinspect is an extension that is included with the contrib module, which is useful in showing the contents of a page. All the tuples of a table are stored in one or more pages. This extension gives granular visibility to the contents stored inside each page.

To create this extension, we shall just issue the command seen in Step 17:

postgres=# CREATE EXTENSION pageinspect ;
CREATE EXTENSION

-- Verify
postgres=# \dx
                                List of installed extensions
    Name     | Version |   Schema |                   Description                      
-------------+---------+------------+-------------------------------------------------------
 pageinspect | 1.6     | public | inspect the contents of database pages at a low level
 plpgsql     | 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)

This extension provides two functions:

get_raw_page         : reads the specified 8KB page
heap_page_item_attrs : shows metadata and data of each tuple

From the previous log, we saw that there are six records after deleting one record from the table. But, has the record really been deleted from the table? Let's look at what is stored inside the page.

As there are very few tuples inside the table, we can see from the following output that there is only 1 page of size 8 KB for this table:

$ psql -d postgres -c "select relname, relpages from pg_class where oid = 'foo.bar'::regclass::oid"
 relname | relpages 
---------+----------
 bar     | 1
(1 row)

$ psql -c "show block_size"
 block_size 
------------
 8192
(1 row)

The page sequence starts from 0. So, we shall use pageinspect to see what is inside page 0:

$ psql -d postgres -c "SELECT t_xmin, t_xmax, t_field3 as t_cid, t_ctid FROM
heap_page_items(get_raw_page('foo.bar',0))"
 t_xmin | t_xmax | t_cid | t_ctid 
--------+--------+-------+--------
  11705 |      0 | 0 | (0,1)
  11705 |      0 | 0 | (0,2)
  11705 |      0 | 0 | (0,3)
  11705 |      0 | 0 | (0,4)
  11705 |      0 | 0 | (0,5)
  11905 |  11911 |   0 | (0,6)
  11907 |  11910 |   0 | (0,7)
(7 rows)

In the previous log, we saw that there is no such tuple with ctid = (0,7). But we have deleted the record (committed) with id = 7. Is it still quite surprising that is not gone from the page? I don't think it is anymore, because we discussed earlier that UNDO is stored in its own table. So, a tuple that was deleted earlier is still stored in the table until a cleanup process removes it. The cleanup process (VACUUM) removes it only when there are no transactions dependent on the deleted record. In the following log, we saw that the record with this ctid has its xmax_committed set to t (true). What this means is that a delete was issued by transaction ID 11910 and it got committed:

postgres=# \x
Expanded display is on.

postgres=# SELECT lp, 
       t_ctid AS ctid,
       t_xmin AS xmin,
       t_xmax AS xmax,
       (t_infomask & 128)::boolean AS xmax_is_lock,
       (t_infomask & 1024)::boolean AS xmax_committed,
       (t_infomask & 2048)::boolean AS xmax_rolled_back,
       (t_infomask & 4096)::boolean AS xmax_multixact,
       t_attrs[1] AS p_id,
       t_attrs[2] AS p_val
FROM heap_page_item_attrs(
        get_raw_page('foo.bar', 0), 
        'foo.bar'
     ) WHERE lp = 7;
-[ RECORD 1 ]----+-----------
lp               | 7
ctid             | (0,7)
xmin             | 11907
xmax             | 11910
xmax_is_lock     | f
xmax_committed   | t
xmax_rolled_back | f
xmax_multixact   | f
p_id             | \x07000000
p_val            | \x09617669

There's more...

So far, in the previous sections, we have understood how MVCC works in PostgreSQL. The final conclusion is that there may be several row versions maintained within each table due to deletions or updates. Over a period of time, there may be many such deleted records still stored in each page. Such records/tuples are called dead tuples. And the tuples that are inserted and remain unmodified are called live tuples. Dead tuples occupy more space and may decrease the performance of queries in the database. How should we manage these dead tuples? Should we perform any periodic manual maintenance or is it taken care of automatically? If it's automatic, what does that job? The answer to all of these questions is VACUUM. Let's learn about it in detail now.

When you start Postgres, you should see that there is a list of background processes running, as seen in the following screenshot. These processes (aka utility processes) take some responsibility each to help users in the best possible way. One of these processes is the autovacuum launcher process. This process takes the responsibility of starting VACUUM and ANALYZE tasks on tables:

VACUUM cleans up dead tuples so that the space occupied by them can be reused by future inserts (an update does a deletion and an insertion). Whereas an ANALYZE collects the statistics of a table so that the execution plan prepared by the parser for a query using this table is optimal. There are certain parameters in PostgreSQL (postgresql.conf) that are used by this process to determine when to run an autovacuum vacuum or an autovacuum analyze on a table. We shall learn about tuning autovacuum and the internals of autovacuum in future chapters.

Filter reviews by

All

Amazon verified reviews

Andrey Apr 18, 2021

Covers all the area. Very good book! Highly recommended!That’s exactly what you need to run a secure, fault tolerant and fast database!

Amazon Verified review

filippos Apr 08, 2021

According to the books description it provides recipes to build high-performance and fault-tolerant solutions. I could not agree more that this is exactly what it does. It's a rather compact book of 300 pages but manages to accumulate 120 recipes that are useful for the day to day work of a postgresql engineer. It could be a scheduled task like upgrading a cluster or setting up monitoring but also a sudden issue that requires immediate attention like identifying slow queries, getting query plans and finding the culprit. In all cases, this book provides solutions. Not worrying too much about the theoretical struggle but focusing on the solutions. However, i particularly enjoyed the MVCC part, which even though theoretical, was short and efficient, exactly as is should be. I also liked the structure of the chapters, i.e chapter 3 refers to taking a backup, chapter 4 uses parts of chapter 3 when creating a slave and chapter 5 uses the slave from chapter 4 to create a highly available system. Prometheus in monitoring and pg_repack, pg_prewarm and pg_stat_statements in the extensions were also parts that are not very often found in such books but can help immensely an engineer in his work. If I could add a recommendation I would suggest that in some cases I would prefer to have a little more information on the problem the is solved rather than just having the solution. For example when running pg_repack to rebuild tables, maybe there could be some more info on how someone identified the specific table needs to be rebuilt and what process led to this conclusion. Overall it's a great book, highly recommended to all postgresql professionals who want to have a point of reference they can use daily.

Konstantin Kondakov Jun 07, 2021

I have read dozens of PostgreSQL books - this one by far the best one. The level of technical details and best practices is truly amazing. if you need great "cheat sheet" on PostgreSQL with over 120 recipes how to make the best out of the PG engine

Cameron S May 24, 2021

If you're a DBA experienced with another RDBMS who is coming up to speed on PostgreSQL, or similarly in a DevOps role for PostgreSQL, this book is filled excellent "recipes" to help you clearly see the path to get the job done.I don't mean to imply this is a beginner's book -- far from that. The first chapter dives into cluster configuration and management. So it does assume at least an awareness of the needs of setting up a modern RDBMS environment.For every task identified, an overview is provided, followed by the specific statements necessary to set up the related configuration. Related variation, features and options are all explained clearly.As another reviewer commented, this is the book you want to have from the start of your ops career with Postgres.

Amazon Customer Jun 01, 2021

This book gives you what you need to build a good PostgreSQL server by walking you through each step needed to build a professional setup. That setup covers installation, streaming replications, auto failover, security, backups, restoration, basic tuning, and monitoring.While this books covers the pure PostgreSQL solutions, it also covers many 3rd party solutions such as pgBackRest, pgBouncer, Patroni, pgBadger, Prometheus, Grafana, pg_repack, pgAudit, etc. I use most of these 3rd party software’s myself or I use a competing one that are just as good, for example I use Collectd/Graphite/Grafana instead of node_exporter/Prometheus/Grafana, but that is because I am do not need to monitor Docker/Kubernetes clusters. Both are excellent solutions. The outcome is the same, I have the results of the monitoring displayed in Grafana.This book covers all the same topics that I teach at PostgreSQL User Group meetings and present as a PostgreSQL speaker at conferences and I would recommend it to all new database administrators or people who are upgrading from an older version of PostgreSQL to PostgreSQL 13.While I do agree with KS's review about adding more to the book, that would also make the book too thick to read. This is a good general book and if you need more details on each subject then you need to get a book specific to that subject, such as SQL language, etc. No one book can cover everything, otherwise it would just be too thick to read and too expensive to buy. This book is not for learning PostgreSQL's SQL language or for high performance tuning of PostgreSQL. If you want to learn PostgreSQL's SQL language, then you will want to start with Learn PostgreSQL, for the basics, and then move on to Mastering PostgreSQL 13 for the advanced SQL. To tune PostgreSQL for high performance, I personally use PostgreSQL 10 High Performance by Gregory Smith.The one issue that I found with this book, is that there were a few editing mistakes, but that is to be expected with a book over 300 pages. When the book was updated from PostgreSQL 12 to PostgreSQL 13 a few of the reference's still reference PostgreSQL 12 when they should now reference PostgreSQL 13. This is easy enough for the reader to catch and type correctly.This book, like all others, only covers the basics of monitoring, this is because they are trade secrets of each database administrator and the companies that sell monitoring solutions. If you make yourself familiar with the system tables, views, and functions, then you can write your own queries for monitoring PostgreSQL, just like I have personally done. It is not hard; it just takes patience.Please Note: This physical copy of PostgreSQL 13 Cookbook was donated to me by Packt Publishing, but the views are mine alone based on being a PostgreSQL Database Administrator for the last 20 years aka 2001. I also already had access to this book through my personally paid for Packt Publishing subscription which included this book electronically and I have also previously purchased many PostgreSQL books from Packt Publishing.