Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
PostgreSQL 16 Administration Cookbook

You're reading from   PostgreSQL 16 Administration Cookbook Solve real-world Database Administration challenges with 180+ practical recipes and best practices

Arrow left icon
Product type Paperback
Published in Dec 2023
Publisher Packt
ISBN-13 9781835460580
Length 636 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (5):
Arrow left icon
Boriss Mejías Boriss Mejías
Author Profile Icon Boriss Mejías
Boriss Mejías
Jimmy Angelakos Jimmy Angelakos
Author Profile Icon Jimmy Angelakos
Jimmy Angelakos
Simon Riggs Simon Riggs
Author Profile Icon Simon Riggs
Simon Riggs
Gianni Ciolli Gianni Ciolli
Author Profile Icon Gianni Ciolli
Gianni Ciolli
Vibhor Kumar Vibhor Kumar
Author Profile Icon Vibhor Kumar
Vibhor Kumar
+1 more Show less
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. First Steps 2. Exploring the Database FREE CHAPTER 3. Server Configuration 4. Server Control 5. Tables and Data 6. Security 7. Database Administration 8. Monitoring and Diagnosis 9. Regular Maintenance 10. Performance and Concurrency 11. Backup and Recovery 12. Replication and Upgrades 13. Other Books You May Enjoy
14. Index

Performing actions on many tables

As a database administrator, you will often need to apply multiple commands as part of the same overall task. This task could be one of the following:

  • Performing many different actions on multiple tables
  • Performing the same action on multiple tables
  • Performing the same action on multiple tables in parallel
  • Performing different actions, one on each table, in parallel

The first is a general case where you need to make a set of coordinated changes. The solution is to write a script, as we’ve already discussed. We can also call this static scripting because you write the script manually and then execute it.

The second type of task can be achieved very simply with dynamic scripts, where we write a script that writes another script. This technique is the main topic of this recipe.

Performing actions in parallel sounds cool, and it would be useful if it were easy. In some ways, it is, but trying to run multiple tasks concurrently and trap and understand all the errors is much harder. And if you’re thinking it won’t matter if you don’t check for errors, think again. If you run tasks in parallel, then you cannot run them inside the same transaction, so you need error handling in case one part fails.

Don’t worry! Running in parallel is usually not as bad as it may seem after reading the previous paragraph, and we’ll explain it after looking at a few basic examples.

Getting ready

Let’s create a basic schema to run some examples on:

postgres=# create schema test;
CREATE SCHEMA
postgres=# create table test.a (col1 INTEGER);
CREATE TABLE
postgres=# create table test.b (col1 INTEGER);
CREATE TABLE
postgres=# create table test.c (col1 INTEGER);
CREATE TABLE

How to do it…

Our task is to run a SQL statement using this form, with X as the table name, against each of our three test tables:

ALTER TABLE X
ADD COLUMN last_update_timestamp TIMESTAMP WITH TIME ZONE DEFAULT current_timestamp;

The steps are as follows:

  1. Our starting point is a script that lists the tables that we want to perform tasks against – something like the following:
    postgres=# SELECT n.nspname, c.relname
        FROM pg_class c
        JOIN pg_namespace n
        ON c.relnamespace = n.oid
        WHERE n.nspname = 'test'
        AND c.relkind = 'r';
    
  2. This displays the list of tables that we will act upon (so that you can check it):
    relname
    ---------
    a
    b
    c
    (3 rows)
    
  3. We can then use the preceding SQL to generate the text for a SQL script, substituting the schema name and table name in the SQL text:
    postgres=# SELECT format('ALTER TABLE %I.%I ADD COLUMN last_update_timestamp TIMESTAMP WITH TIME ZONE DEFAULT current_timestamp;'
    , n.nspname, c.relname )
    FROM pg_class c
    JOIN pg_namespace n
    ON c.relnamespace = n.oid
    WHERE n.nspname = 'test'
    AND c.relkind = 'r';
    
  4. Finally, we can run the script and watch the results (success!):
    postgres=# \gexec
    ALTER TABLE
    ALTER TABLE
    ALTER TABLE
    

How it works…

Overall, this is just an example of dynamic scripting, and it has been used by DBAs for many decades, even before PostgreSQL was born.

The \gexec command means to execute the results of the query, so be very careful that you test your query before you run it in production.

The format function takes a template string as its first argument and replaces all occurrences of %I with the values supplied as additional arguments (in our case, the values of n.nspname and r.relname).

%I treats the value as a SQL identifier, adding double quotes as appropriate. This is extremely important if some joker or attacker creates a table like this:

postgres=# create table test."; DROP TABLE customer;" (col1 INTEGER);

If the script used just %s rather than %I, then the script will generate this SQL, which will result in you dropping the customer table if it exists. So, for security purposes, you should use %I:

ALTER TABLE test.a ADD COLUMN last_update_timestamp TIMESTAMP WITH TIME ZONE DEFAULT current_timestamp;
ALTER TABLE test.; drop table customer; ADD COLUMN last_update_timestamp TIMESTAMP WITH TIME ZONE DEFAULT current_timestamp;
ALTER TABLE test.b ADD COLUMN last_update_timestamp TIMESTAMP WITH TIME ZONE DEFAULT current_timestamp;
ALTER TABLE test.c ADD COLUMN last_update_timestamp TIMESTAMP WITH TIME ZONE DEFAULT current_timestamp;

Dynamic scripting can also be called a quick and dirty approach. The previous scripts didn’t filter out views and other objects in the test schema, so you’ll need to add that yourself, or not, as required.

There is another way of doing this as well:

DO $$
DECLARE t record;
BEGIN
    FOR t IN SELECT c.*, n.nspname
         FROM pg_class c JOIN pg_namespace n
         ON c.relnamespace = n.oid
         WHERE n.nspname = 'test'
         AND c.relkind = 'r'  /* ; not needed */
     LOOP
         EXECUTE format(
           'ALTER TABLE %I.%I
           ADD COLUMN last_update_timestamp
         TIMESTAMP WITH TIME ZONE'
         , t.nspname, t.relname);
     END LOOP;
END $$;

I don’t prefer using this method because it executes the SQL directly and doesn’t allow you to review it before, or keep the script afterward.

The preceding syntax with DO is called an anonymous code block because it’s like a function without a name.

There’s more…

Earlier, I said I’d explain how to run multiple tasks in parallel. Some practical approaches to this are possible, with a bit of discussion.

Making tasks run in parallel can be thought of as subdividing the main task so that we run x2, x4, x8, and other subscripts, rather than one large script.

First, you should note that error-checking gets worse when you spawn more parallel tasks, whereas performance improves the most for the first few subdivisions. Also, we’re often constrained by CPU, RAM, or I/O resources for intensive tasks. This means that splitting the main task into two to four parallel subtasks isn’t practical without some kind of tool to help us manage them.

There are two approaches here, depending on the two types of tasks:

  • A task consists of many smaller tasks, all roughly of the same size.
  • A task consists of many smaller tasks, and the execution times vary according to the size and complexity of the database object.

If we have lots of smaller tasks, then we can simply run our scripts multiple times using a simple round-robin split so that each subscript runs a part of all the subtasks. Here is how to do it: each row in pg_class has a hidden column called oid, whose value is a 32-bit number that’s allocated from an internal counter on table creation. Therefore, about half of the tables will have even values of oid, and we can achieve an even split by adding the following clauses:

  • Script 1: Add WHERE c.oid % 2 = 0
  • Script 2: Add WHERE c.oid % 2 = 1

Here, we added a column to many tables. In the previous example, we were adding the column with no specified default; so, the new column will have a NULL value, and as a result, it will run very quickly with ALTER TABLE, even on large tables. If we change the ALTER TABLE statement to specify a default, then we should choose a non-volatile expression for the default value; otherwise, PostgreSQL will need to rewrite the entire table. So, the runtime will vary according to the table’s size (approximately, and also according to the number and type of indexes).

Now that our subtasks vary at runtime according to their size, we need to be more careful when splitting the subtasks so that we end up with multiple scripts that will run for about the same time.

If we already know that we have just a few big tables, it’s easy to split them manually into scripts.

If the database contains many large tables, then we can sort SQL statements by table size and then distribute them using round-robin distribution into multiple subscripts that will have approximately the same runtime. The following is an example of this technique, which assumes you have multiple large tables in a schema called test.

First, create a table with all the SQL you would like to run:

CREATE TABLE run_sql AS
SELECT format('ALTER TABLE %I.%I ADD COLUMN
last_update_timestamp TIMESTAMP WITH TIME ZONE
DEFAULT now();' , n.nspname, c.relname) as sql,
row_number() OVER (ORDER BY pg_relation_size(c.oid))
FROM pg_class c
JOIN pg_namespace n
   ON c.relnamespace = n.oid
WHERE n.nspname = 'test'
  AND c.relkind = 'r';

Then, create a file called exec-script.sql and place the following code in it:

SELECT sql FROM run_sql
WHERE row_number % 2 = :i
ORDER BY row_number DESC
\gexec

Then, we run the script twice, as follows:

$ psql -v i=0 -f make-script.sql &
$ psql -v i=1 -f make-script.sql &

Note how we used the psql parameters – via the -v command-line option – to select different rows using the same script.

Also, note how we used the row_number() window function to sort the data by size. Then, we split the data into pieces using the following line:

WHERE row_number % N = i;

Here, N is the total number of scripts we’re producing, and i ranges between 0 and N minus 1 (we are using modulo arithmetic to distribute the subtasks).

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image