Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering PostgreSQL 17

You're reading from   Mastering PostgreSQL 17 Elevate your database skills with advanced deployment, optimization, and security strategies

Arrow left icon
Product type Paperback
Published in Dec 2024
Publisher Packt
ISBN-13 9781836205975
Length 474 pages
Edition 6th Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Hans-Jürgen Schönig Hans-Jürgen Schönig
Author Profile Icon Hans-Jürgen Schönig
Hans-Jürgen Schönig
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Chapter 1: What is New in PostgreSQL 17 2. Chapter 2: Understanding Transactions and Locking FREE CHAPTER 3. Chapter 3: Making Use of Indexes 4. Chapter 4: Handling Advanced SQL 5. Chapter 5: Log Files and System Statistics 6. Chapter 6: Optimizing Queries for Good Performance 7. Chapter 7: Writing Stored Procedures 8. Chapter 8: Managing PostgreSQL Security 9. Chapter 9: Handling Backup and Recovery 10. Chapter 10: Making Sense of Backups and Replication 11. Chapter 11: Deciding on Useful Extensions 12. Chapter 12: Troubleshooting PostgreSQL 13. Chapter 13: Migrating to PostgreSQL 14. Index 15. Other Books You May Enjoy

Using SQL and developer features

In this section, we will discuss some of the most desired developer and SQL features that have made it into PostgreSQL 17.

Teaching COPY error handling

Let us start with my personal favorite: COPY is finally able to handle errors in a reasonably good way. Many people were frustrated by the error-handling behavior. Standard PostgreSQL will stop COPY when it hits an error. It is good to see that this vital functionality has made it into the official release of PostgreSQL.

The main question arising is: how can we make use of this feature? Here is a simple command to create a sample table:

test=# CREATE TABLE t_data (
    id    int,
    data  text
);
CREATE TABLE

The goal is to import the following dataset into the table and make sure those errors are handled properly:

1  hans
2  paul
abc  joe
4  jane
def  james
5  laura

What we see here is that the data is definitely wrong. The following listing proves this beyond doubt:

test=# COPY t_data FROM '/tmp/file.txt';
ERROR:  invalid input syntax for type integer: "abc"
CONTEXT:  COPY t_data, line 3, column id: "abc"

Fortunately, we can handle this kind of problem in PostgreSQL 17 and simply ignore the error:

test=# COPY t_data FROM '/tmp/file.txt'
WITH (ON_ERROR 'ignore');
NOTICE:  2 rows were skipped due to data type incompatibility
COPY 4
test=# SELECT * FROM t_data ;
 id | data
----+-------
  1 | hans
  2 | paul
  4 | jane
  5 | laura
(4 rows)

PostgreSQL will import the data and simply skip over invalid data. A NOTICE label will indicate how many rows have been skipped. As of version 17, two types of ON_ERROR settings are supported: stop and ignore. In the future, it is likely that more options will be available.

Splitting and merging partitions

In the previous years, there has not been a single version of PostgreSQL that has not provided relevant improvements to partitioning as a whole. The same holds true for the new release. This time, the development team has been working on splitting and merging partitions, which has been a frequent requirement over the years.

The following listing shows how a simple table including a partition can be created:

CREATE TABLE t_timeseries (
    id serial,
    d date,
    payload text
) PARTITION BY RANGE (d);
CREATE TABLE t_timeseries_2024
PARTITION OF t_timeseries
FOR VALUES FROM ('2024-01-01')
TO ('2025-01-01');

Here is a typical example one would encounter in real life. We have some kind of time series and we are using range partitions to split the data into smaller chunks for various reasons (faster cleanup, scalability, and so on). However, tables are often too large and we have to break them up into smaller chunks. This is when PostgreSQL can do some of the heavy lifting for us:

ALTER TABLE t_timeseries
   SPLIT PARTITION t_timeseries_2024
   INTO (
     PARTITION t_timeseries_2024_h1
    FOR VALUES FROM ('2024-01-01') TO ('2024-07-01'),
  PARTITION t_timeseries_2024_h2
    FOR VALUES FROM ('2024-07-01') TO ('2025-01-01')
);

What we do here is take our partition and split it into two new chunks that contain roughly half of the data. Note that this is a single command that takes care of this operation.

While splitting partitions into various pieces might be by far the most common new operation, it is also possible to reverse this decision and unify various partitions into a single entity. The way to do that is by using the ALTER TABLE … MERGE PARTITIONS … command, which is equally as easy to use as the SPLIT command that we have observed and tested before:

ALTER TABLE t_timeseries
MERGE PARTITIONS (
    t_timeseries_2024_h1, 
    t_timeseries_2024_h2
)
INTO t_timeseries_2024;

All we have to do here is to tell PostgreSQL which partitions are supposed to form the new entity and let the database engine do its magic.

Tuning numbers into binary and octal values

One of the lesser-known features that made it into PostgreSQL is the ability to convert numbers to a binary and, respectively, octal representation. Two overloaded functions have been added – to_bin and to_oct:

test=# \df *to_bin*
   List of functions
   Schema   |  Name  | Result data type | Argument data types | Type
------------+--------+------------------+---------------------+------
 pg_catalog | to_bin | text             | bigint              | func
 pg_catalog | to_bin | text             | integer             | func
(2 rows)
test=# \df *to_oct*
   List of functions
   Schema   |  Name  | Result data type | Argument data types | Type
------------+--------+------------------+---------------------+------
 pg_catalog | to_oct | text             | bigint              | func
 pg_catalog | to_oct | text             | integer             | func
(2 rows)

Both functions can be called with 32- or 64-bit integer values. The following listing shows an example of those functions in action:

test=# SELECT to_bin(4711), to_oct(4711);
    to_bin     | to_oct
---------------+--------
 1001001100111 | 11147
(1 row)

Improving MERGE even more

MERGE has been around for various releases. In SQL, the MERGE command is used to merge data from two tables into one table. This command is useful when you need to update or insert rows based on a common column between the two tables.

The new release of PostgreSQL has also introduced another feature, namely, WHEN NOT MATCHED BY SOURCE THEN. This additional syntax allows us to define the behavior even better and adds some flexibility.

Here is how it works:

CREATE TABLE t_demo (
    a int PRIMARY KEY, 
    b int
);
INSERT INTO t_demo
VALUES (1, 4711), 
       (2, 5822), 
       (3, 6933);
CREATE TABLE t_source (
    a int PRIMARY KEY, 
    b int
);
INSERT INTO t_source
VALUES (2, 6822), 
       (3, 6933), 
       (4, 1252);
MERGE INTO t_demo AS t1
USING t_source AS t2 
ON t1.a = t2.a
WHEN MATCHED THEN
    UPDATE SET b = t1.b * 100
WHEN NOT MATCHED THEN
    INSERT (a, b) VALUES (t2.a, t2.b)
WHEN NOT MATCHED BY SOURCE THEN
    DELETE
RETURNING t1.*, t2.*;

This MERGE statement will return the following data:

 a |   b    | a |  b
---+--------+---+------
 1 |   4711 |   |
 2 | 582200 | 2 | 6822
 3 | 693300 | 3 | 6933
 4 |   1252 | 4 | 1252
(4 rows)

RETURNING * can be really useful to debug the statement as a whole. The same is true in my example: a = 1 is available in the original table but not in the source table and the row is therefore deleted. In the case of a = 2 and a = 3, we got a full match and, therefore, the UPDATE statement will execute. a = 4 is only present in the t_source table and is therefore inserted into the t_demo table.

The following table shows what we can expect to find after the MERGE operation:

 a |   b
---+--------
 2 | 582200
 3 | 693300
 4 |   1252
(3 rows)

As you can see, all three cases defined in the MERGE statement have been executed successfully. The question is: which row was touched by which rule? We can modify the RETURNING clause a bit:

RETURNING merge_action(), t1.*, t2.*

In this case, PostgreSQL will provide us with even more information, as we can see in the following listing:

 merge_action | a |   b    | a |  b
--------------+---+--------+---+------
 DELETE       | 1 |   4711 |   |
 UPDATE       | 2 | 582200 | 2 | 6822
 UPDATE       | 3 | 693300 | 3 | 6933
 INSERT       | 4 |   1252 | 4 | 1252
(4 rows)

Additional JSON functionality

The number of JSON-related functionalities has skyrocketed over the years. The same is true for version 17, which provides many more features that will make JSON easier to use and more powerful overall. The first thing that caught my attention was the fact that more standard compliant functions have been added – namely, JSON_EXISTS(), JSON_QUERY(), and JSON_VALUE().

What is also noteworthy is JSON_TABLE, which allows us to turn a JSON document into a tabular format in one go. This is pretty similar to what XMLTABLE does.

The syntax might look as follows:

SELECT jt.*
FROM customers,
     JSON_TABLE (
         js, '$.favorites[*]' COLUMNS (
             id FOR ORDINALITY,
             country text PATH '$.country',
             branch text PATH '$.industry[*].branch' WITH WRAPPER,
             ceo text PATH '$.company[*].ceo' WITH WRAPPER
         )
     ) AS jt;

What this does is to address various elements in the JSON document and return it in a format we can actually and safely read.

Creating BRIN indexes in parallel

Technically, this is not a developer feature, but given the fact that performance topics are often hard to categorize, I decided to include this here. BRIN indexes are often used in data warehouses to quickly filter data without carrying the overhead of full-blown B-tree indexes. Creating B-trees has long been possible using more than one CPU. However, in PostgreSQL 17, it is now possible to create BRIN indexes in parallel, which can greatly speed up the process

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image