This book aims to be a practical guide. For this reason, in the following chapters, you will see several code examples. Instead of building ad-hoc examples for every feature, the book references a small database from a real-world application, in order to show how you can improve your own database with the features covered.
The example database, named testdb, is inspired by an asset-management software that stores file metadata and related tags. A file is something that is stored on a disk and is identified by properties such as a name, a hash (of its content), and a size on the disk. Each file can be categorized with tags, which are labels that are attached to the file itself.
Listing 1 shows the SQL code that generates the structure of the file table, which has the following columns:
- pk is a surrogate key that is automatically generated by a sequence
- f_name represents the file name on the disk
- f_size represents the size of the file on the disk
- f_type is a textual representation of the file type (for example, MP3 for a music file)
- f_hash represents a hash of the content of the file
We might want to prevent the addition of two files with the same content hash. In this case, the f_hash column works as a unique key. Another optional constraint is related to file size; since every file on a disk has a size greater or equal to zero (bytes), it is possible to force the f_size column to store only non-negative values. Similarly, the name of the file cannot be unspecified. More constraints can be added; we will cover some of these in the following chapters.
CREATE TABLE IF NOT EXISTS files (
pk int GENERATED ALWAYS AS IDENTITY,
f_name text NOT NULL,
f_size numeric(15,4) DEFAULT 0,
f_hash text NOT NULL DEFAULT 'N/A',
f_type text DEFAULT 'txt',
ts timestamp DEFAULT now(),
PRIMARY KEY ( pk ),
UNIQUE ( f_hash ),
CHECK ( f_size >= 0 )
);
Listing 1: Code to create the files table
Listing 2 shows the structure of the tags table:
- pk is a surrogate key that is automatically generated by a sequence.
- t_name is the tag name.
- t_child_of is a self-reference to the tuple of another tag. Tags can be nested into each other to build a hierarchy of tags. As an example, let's say the photos tag contains the family and trips tags; these are children of the photos tag. The same tag can appear in different hierarchies, but cannot appear twice in the same position of the same hierarchy. For this reason, a unique constraint over the tag name and its relationship is enforced.
CREATE TABLE IF NOT EXISTS tags(
pk int GENERATED ALWAYS AS IDENTITY,
t_name text NOT NULL,
t_child_of int,
PRIMARY KEY ( pk ),
FOREIGN KEY ( t_child_of ) REFERENCES tags( pk ),
UNIQUE( t_name, t_child_of )
);
Listing 2: SQL code to generate the tags table structure
Since a file can have multiple tags, a join table has been used to instantiate a many-to-many relationship. Listing 3 shows a join table, which is named j_files_tags. This simply stores a relationship between the tuple of a file and the tuple of a tag, allowing only one association between a file and a tag.
CREATE TABLE IF NOT EXISTS j_files_tags (
pk int GENERATED ALWAYS AS IDENTITY,
f_pk int,
t_pk int,
PRIMARY KEY ( pk ),
UNIQUE( f_pk, t_pk ),
FOREIGN KEY ( f_pk ) REFERENCES files( pk ) ON DELETE CASCADE,
FOREIGN KEY ( t_pk ) REFERENCES tags( pk ) ON DELETE CASCADE
);
Listing 3: SQL code to create a join table to match files and tag tables data
There are also some other tables that are used to demonstrate particular scenarios. The first is named archive_files, and has the same structure as the files table. The other is named playlist, and represents a very minimalistic music playlist with filenames and a simple structure, as shown in listing 4:
CREATE TABLE IF NOT EXISTS playlist (
pk int GENERATED ALWAYS AS IDENTITY,
p_name text NOT NULL,
PRIMARY KEY ( pk )
);
Listing 4: SQL code to generate the playlist table
We can either construct these tables by hand or by using one of the scripts provided with the book code snippets from the code repository, located at https://github.com/PacktPublishing/PostgreSQL-11-Quick-Start-Guide. In this case, the tables will also be populated with some test data that we can use to show the results of queries that we will be running in the following chapters.
As you can see, each table column has a prefix letter that identifies the table to which it belongs. For example, f_name has a prefix, f, that reminds the files table it belongs to. This can help us to discriminate the columns of a table when they are joined or when a complex query is built. Throughout the examples in this book, the same concept will be applied to different objects. For example, a function will have a name starting with an f, a procedure with a p, a trigger with tr, and so on. While this is not a commonly-used best practice, it can help us to understand the type of an object and its context from its name.
All the examples shown in this book have been tested and run on PostgreSQL 11 on FreeBSD. They should work seamlessly on any other PostgreSQL 11 installation. All the code has been run through the official psql command line client, even if it is possible to run them with other supported clients (such as pgAdmin4).
Most of the code snippets can be executed as a normal database user. This is emphasized in the code by the psql default prompt, which is as follows:
testdb=>
If the code must be run from a database administrator, otherwise known as a superuser, the prompt will change accordingly, as follows:
testdb=#
As well as this, the examples in which superuser privileges are required will be clearly indicated.
Each time we execute a statement via psql, we get a reply that confirms the execution of the statement. If we execute SELECT *, the reply we receive will be a list of tuples. If we execute other statements, we will get a tag that represents the execution of the statement. This is demonstrated in the following examples:
testdb=> INSERT INTO playlist VALUES( ... );
INSERT
testdb=> LISTEN my_channel;
LISTEN
testdb=> CREATE FUNCTION foo() RETURNS VOID AS $$ BEGIN END $$ LANGUAGE plpgsql;
CREATE FUNCTION
So that we can focus on the important parts of a code snippet, we will remove the output reply of each statement execution if it is not important. The preceding listing can therefore be represented in a more concise way, as follows:
testdb=> INSERT INTO playlist VALUES( ... );
testdb=> LISTEN my_channel;
testdb=> CREATE FUNCTION foo() RETURNS VOID AS $$ BEGIN END $$ LANGUAGE plpgsql;
In this way, you will see only the commands and the statements that you have to insert into the server connection.