Using SQL and developer features
In this section, we will discuss some of the most desired developer and SQL features that have made it into PostgreSQL 17.
Teaching COPY error handling
Let us start with my personal favorite: COPY
is finally able to handle errors in a reasonably good way. Many people were frustrated by the error-handling behavior. Standard PostgreSQL will stop COPY
when it hits an error. It is good to see that this vital functionality has made it into the official release of PostgreSQL.
The main question arising is: how can we make use of this feature? Here is a simple command to create a sample table:
test=# CREATE TABLE t_data ( id int, data text ); CREATE TABLE
The goal is to import the following dataset into the table and make sure those errors are handled properly:
1 hans 2 paul abc joe 4 jane def james 5 laura
What we see here is that the data is definitely wrong. The following listing proves this beyond doubt:
test=# COPY t_data FROM '/tmp/file.txt'; ERROR: invalid input syntax for type integer: "abc" CONTEXT: COPY t_data, line 3, column id: "abc"
Fortunately, we can handle this kind of problem in PostgreSQL 17 and simply ignore the error:
test=# COPY t_data FROM '/tmp/file.txt' WITH (ON_ERROR 'ignore'); NOTICE: 2 rows were skipped due to data type incompatibility COPY 4 test=# SELECT * FROM t_data ; id | data ----+------- 1 | hans 2 | paul 4 | jane 5 | laura (4 rows)
PostgreSQL will import the data and simply skip over invalid data. A NOTICE
label will indicate how many rows have been skipped. As of version 17, two types of ON_ERROR
settings are supported: stop
and ignore
. In the future, it is likely that more options will be available.
Splitting and merging partitions
In the previous years, there has not been a single version of PostgreSQL that has not provided relevant improvements to partitioning as a whole. The same holds true for the new release. This time, the development team has been working on splitting and merging partitions, which has been a frequent requirement over the years.
The following listing shows how a simple table including a partition can be created:
CREATE TABLE t_timeseries ( id serial, d date, payload text ) PARTITION BY RANGE (d); CREATE TABLE t_timeseries_2024 PARTITION OF t_timeseries FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
Here is a typical example one would encounter in real life. We have some kind of time series and we are using range partitions to split the data into smaller chunks for various reasons (faster cleanup, scalability, and so on). However, tables are often too large and we have to break them up into smaller chunks. This is when PostgreSQL can do some of the heavy lifting for us:
ALTER TABLE t_timeseries SPLIT PARTITION t_timeseries_2024 INTO ( PARTITION t_timeseries_2024_h1 FOR VALUES FROM ('2024-01-01') TO ('2024-07-01'), PARTITION t_timeseries_2024_h2 FOR VALUES FROM ('2024-07-01') TO ('2025-01-01') );
What we do here is take our partition and split it into two new chunks that contain roughly half of the data. Note that this is a single command that takes care of this operation.
While splitting partitions into various pieces might be by far the most common new operation, it is also possible to reverse this decision and unify various partitions into a single entity. The way to do that is by using the ALTER TABLE … MERGE PARTITIONS …
command, which is equally as easy to use as the SPLIT
command that we have observed and tested before:
ALTER TABLE t_timeseries MERGE PARTITIONS ( t_timeseries_2024_h1, t_timeseries_2024_h2 ) INTO t_timeseries_2024;
All we have to do here is to tell PostgreSQL which partitions are supposed to form the new entity and let the database engine do its magic.
Tuning numbers into binary and octal values
One of the lesser-known features that made it into PostgreSQL is the ability to convert numbers to a binary and, respectively, octal representation. Two overloaded functions have been added – to_bin
and to_oct
:
test=# \df *to_bin* List of functions Schema | Name | Result data type | Argument data types | Type ------------+--------+------------------+---------------------+------ pg_catalog | to_bin | text | bigint | func pg_catalog | to_bin | text | integer | func (2 rows) test=# \df *to_oct* List of functions Schema | Name | Result data type | Argument data types | Type ------------+--------+------------------+---------------------+------ pg_catalog | to_oct | text | bigint | func pg_catalog | to_oct | text | integer | func (2 rows)
Both functions can be called with 32- or 64-bit integer values. The following listing shows an example of those functions in action:
test=# SELECT to_bin(4711), to_oct(4711); to_bin | to_oct ---------------+-------- 1001001100111 | 11147 (1 row)
Improving MERGE even more
MERGE
has been around for various releases. In SQL, the MERGE
command is used to merge data from two tables into one table. This command is useful when you need to update or insert rows based on a common column between the two tables.
The new release of PostgreSQL has also introduced another feature, namely, WHEN NOT MATCHED BY SOURCE THEN
. This additional syntax allows us to define the behavior even better and adds some flexibility.
Here is how it works:
CREATE TABLE t_demo ( a int PRIMARY KEY, b int ); INSERT INTO t_demo VALUES (1, 4711), (2, 5822), (3, 6933); CREATE TABLE t_source ( a int PRIMARY KEY, b int ); INSERT INTO t_source VALUES (2, 6822), (3, 6933), (4, 1252); MERGE INTO t_demo AS t1 USING t_source AS t2 ON t1.a = t2.a WHEN MATCHED THEN UPDATE SET b = t1.b * 100 WHEN NOT MATCHED THEN INSERT (a, b) VALUES (t2.a, t2.b) WHEN NOT MATCHED BY SOURCE THEN DELETE RETURNING t1.*, t2.*;
This MERGE
statement will return the following data:
a | b | a | b ---+--------+---+------ 1 | 4711 | | 2 | 582200 | 2 | 6822 3 | 693300 | 3 | 6933 4 | 1252 | 4 | 1252 (4 rows)
RETURNING *
can be really useful to debug the statement as a whole. The same is true in my example: a = 1
is available in the original table but not in the source table and the row is therefore deleted. In the case of a = 2
and a = 3
, we got a full match and, therefore, the UPDATE
statement will execute. a = 4
is only present in the t_source
table and is therefore inserted into the t_demo
table.
The following table shows what we can expect to find after the MERGE
operation:
a | b ---+-------- 2 | 582200 3 | 693300 4 | 1252 (3 rows)
As you can see, all three cases defined in the MERGE
statement have been executed successfully. The question is: which row was touched by which rule? We can modify the RETURNING
clause a bit:
RETURNING merge_action(), t1.*, t2.*
In this case, PostgreSQL will provide us with even more information, as we can see in the following listing:
merge_action | a | b | a | b --------------+---+--------+---+------ DELETE | 1 | 4711 | | UPDATE | 2 | 582200 | 2 | 6822 UPDATE | 3 | 693300 | 3 | 6933 INSERT | 4 | 1252 | 4 | 1252 (4 rows)
Additional JSON functionality
The number of JSON-related functionalities has skyrocketed over the years. The same is true for version 17, which provides many more features that will make JSON easier to use and more powerful overall. The first thing that caught my attention was the fact that more standard compliant functions have been added – namely, JSON_EXISTS()
, JSON_QUERY()
, and JSON_VALUE()
.
What is also noteworthy is JSON_TABLE
, which allows us to turn a JSON document into a tabular format in one go. This is pretty similar to what XMLTABLE
does.
The syntax might look as follows:
SELECT jt.* FROM customers, JSON_TABLE ( js, '$.favorites[*]' COLUMNS ( id FOR ORDINALITY, country text PATH '$.country', branch text PATH '$.industry[*].branch' WITH WRAPPER, ceo text PATH '$.company[*].ceo' WITH WRAPPER ) ) AS jt;
What this does is to address various elements in the JSON document and return it in a format we can actually and safely read.
Creating BRIN indexes in parallel
Technically, this is not a developer feature, but given the fact that performance topics are often hard to categorize, I decided to include this here. BRIN indexes are often used in data warehouses to quickly filter data without carrying the overhead of full-blown B-tree indexes. Creating B-trees has long been possible using more than one CPU. However, in PostgreSQL 17, it is now possible to create BRIN indexes in parallel, which can greatly speed up the process