PostgreSQL 13 is a major milestone and many big and small features have made it into the database core this time. In this chapter, you will be introduced to the most important developments in the PostgreSQL world. Let's get started and see what the developers have come up with.
Digging into SQL and developer-related topics
PostgreSQL 13 provides some new features that are especially important to developers.
Improving psql command-line handling
In PostgreSQL 13 a small extension has been committed that basically helps to track the state of a session in psql better. You can now see whether a transaction is running successfully or not. Let's take a look at a simple listing:
test=# BEGIN;
BEGIN
test=*# SELECT 1;
?column?
----------
1
(1 row)
test=*# SELECT 1 / 0;
ERROR: division by zero
test=!# SELECT 1 / 0;
ERROR: current transaction is aborted, commands ignored until end of transaction block
test=!# COMMIT;
ROLLBACK
What is so special about this code block? The thing to notice is that the prompt has changed. The reason is that %x has been added to PROMPT1 and PROMPT2. We can now see instantly if a transaction is active, in dire straits, or if no transaction is active. This makes command-line handling a bit easier and the expectation is that fewer errors will happen.
Improving pgbench
pgbench is one of the most commonly used tools to benchmark a PostgreSQL database. In older versions, pgbench created some standard tables. With the introduction of PostgreSQL 13, the default dataset can now be partitioned (out of the box).
Here is what happens:
% pgbench -i -s 100 --partitions=10
...
done in 19.70 s (drop tables 0.00 s, create tables 0.03 s, generate 7.34 s, vacuum 10.24 s, primary keys 2.08 s).
test=# \d+
List OF relations
Schema | Name | TYPE | ...
--------+---------------------+-------------------+ ...
public | pgbench_accounts | partitioned TABLE | ...
public | pgbench_accounts_1 | TABLE | ...
public | pgbench_accounts_10 | TABLE | ...
public | pgbench_accounts_2 | TABLE | ...
public | pgbench_accounts_3 | TABLE | ...
...
The -- partitions tell pgbench how many partitions to create. It is now possible to benchmark a partitioned table in relation to a non-partitioned default dataset more easily.
Generating random UUIDs more easily
In older versions, we had to load an extension to handle UUIDs. Many users did not know that those extensions actually existed or did not want to enable them for whatever reason. In PostgreSQL 13, it is possible to generate random UUIDs easily without any extension. Here is how it works:
test=# SELECT gen_random_uuid();
gen_random_uuid
--------------------------------------
ca1754dc-b5d5-442b-a40b-9c6a9d51a9a1
(1 row)
Dropping databases faster
The ability to drop databases faster is an important one and definitely one of my favorite new features. What was the problem?
postgres=# DROP DATABASE test;
ERROR: database "test" is being accessed by other users
DETAIL: There is 1 other session using the database.
A database could only be dropped if nobody else was connected to the database. For many users, this was hard to achieve. Often, new connections come in all the time, and running ALTER DATABASE to block new connections and drop existing ones just to kill the database was simply too cumbersome. The WITH (force) option dramatically simplifies things from the end user's perspective:
postgres=# DROP DATABASE test WITH (force);
DROP DATABASE
The database will be dropped regardless of what is going on in the system, which makes the process much more reliable.
Adding ALTER TABLE ... DROP EXPRESSION ...
PostgreSQL has the ability to materialize the output of an expression. The following listing shows how a generated column can be used:
test=# CREATE TABLE t_test (
a int,
b int,
c int GENERATED ALWAYS AS (a * b) STORED
);
CREATE TABLE
test=# INSERT INTO t_test (a, b) VALUES (10, 20);
INSERT 0 1
As you can see, one row has been added to the table:
test=# SELECT * FROM t_test;
a | b | c
----+----+-----
10 | 20 | 200
(1 row)
The question now is: How can we get rid of generated expressions again? PostgreSQL 13 has the answer:
test=# ALTER TABLE t_test ALTER COLUMN c DROP EXPRESSION ;
ALTER TABLE
c is now an ordinary column just like all the others:
test=# \d t_test;
Table "public.t_test"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
a | integer | | |
b | integer | | |
c | integer | | |
You can now insert into c directly without getting an error.
Making use of performance improvements
PostgreSQL 13 adds a couple of performance improvements. In this section, you will learn about some of the more relevant improvements that can make a difference in real life. Of course, there are hundreds of small changes that have been made here, but the aim of this chapter is really to focus on the most important aspects that make a real difference.
Deduplication of B-tree indexes
B-trees are by far the most important index structure in the PostgreSQL world. Most indexes out there are B-trees. In PostgreSQL 13, B-trees have been improved substantially. In older versions, identical entries were stored individually in the index. If you had 1 million table entries for hans, the index would actually have 1 million copies of "hans" in the data structure as well. In PostgreSQL 12, the typical index entry used to be "(value, ctid)". This has been changed. PostgreSQL will now handle duplicates much more efficiently.
Let's create an example and see how this works:
test=# CREATE TABLE tab (a int, b int);
CREATE TABLE
test=# INSERT INTO tab SELECT id, 1 FROM generate_series(1, 5000000) AS id;
INSERT 0 5000000
In this case, we have created 5 million rows. The a column has 5 million different values while the b column contains 5 million identical entries. Let's create two indexes and see what happens:
test=# CREATE INDEX idx_a ON tab (a);
CREATE INDEX
test=# CREATE INDEX idx_b ON tab (b);
CREATE INDEX
The size of the table itself has not changed in PostgreSQL 13. We will see the same size as before:
test=# \d+
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+------+-------+-------+-------------+--------+-------------
public | tab | table | hs | permanent | 173 MB |
(1 row)
However, what is important is that the size of the idx_b index will be different from the size of the idx_a index:
test=# \di+
List of relations
Schema | Name | Type | Owner | Table | Persistence | Size | Description
--------+-------+-------+-------+-------+-------------+--------+-------------
public | idx_a | index | hs | tab | permanent | 107 MB |
public | idx_b | index | hs | tab | permanent | 33 MB |
(2 rows)
As you can see, PostgreSQL 13 is much more efficient and offers a much lower storage footprint. The lower space consumption will automatically translate to higher cache hit rates, and therefore, generally better performance.
Adding incremental sorting
PostgreSQL 13 has one more improvement around indexing and sorting. The new version offers incremental sorting. What does that mean? Suppose we have two columns: a and b. In our case, a is already sorted because it is indexed. What if we have a presorted list and simply want to add columns? Incremental sorting is what happens in this case.
To show how this works, let's drop the idx_b index first:
test=# DROP INDEX idx_b;
DROP INDEX
We are left with one index (idx_a), which provides a list sorted by a. The following query wants a result ordered by a and b:
test=# explain SELECT * FROM tab ORDER BY a, b;
QUERY PLAN
----------------------------------------------------------------------
Incremental Sort (cost=0.47..376979.43 rows=5000000 width=8)
Sort Key: a, b
Presorted Key: a
-> Index Scan using idx_a on tab (cost=0.43..151979.43 rows=5000000 width=8)
(4 rows)
PostgreSQL will do the incremental sort fed by the idx_a index. The total runtime on my machine is around 1.4 seconds (using the default configuration):
test=# explain analyze SELECT * FROM tab ORDER BY a, b;
QUERY PLAN
---------------------------------------------------------------------
Incremental Sort (cost=0.47..376979.43 rows=5000000 width=8)
(actual time=0.167..1297.696 rows=5000000 loops=1)
Sort Key: a, b
Presorted Key: a
Full-sort Groups: 156250 Sort Method: quicksort
Average Memory: 26kB Peak Memory: 26kB
-> Index Scan using idx_a on tab (cost=0.43..151979.43 rows=5000000 width=8)
(actual time=0.069..773.260 rows=5000000 loops=1)
Planning Time: 0.068 ms
Execution Time: 1437.362 ms
(7 rows)
To see what would have happened in PostgreSQL 12 and before, we can turn off incremental sorting temporarily:
test=# SET enable_incremental_sort TO off;
SET
test=# explain analyze SELECT * FROM tab ORDER BY a, b;
QUERY PLAN
-----------------------------------------------------------------
Sort (cost=765185.42..777685.42 rows=5000000 width=8)
(actual time=1269.250..1705.754 rows=5000000 loops=1)
Sort Key: a, b
Sort Method: external merge Disk: 91200kB
-> Seq Scan on tab (cost=0.00..72124.00 rows=5000000 width=8)
(actual time=0.032..311.777 rows=5000000 loops=1)
Planning Time: 0.057 ms
Execution Time: 1849.416 ms
(6 rows)
PostgreSQL will fall back to the old execution plan, which of course takes more than 400 ms longer, which is quite substantial. But there is more: the memory footprint is lower. The old mechanism had to sort the entire dataset – the new mechanism needs practically no memory to speak of. In other words: incremental sorting is not just a performance issue. It also reduces memory consumption quite dramatically.
Adding -j 8 to reindexdb
The reindexdb command has been around for many years. It can be used to reindex an entire database easily. In PostgreSQL 13, reindexdb has been extended. It now supports the -j flag, which allows you to reindex a database using more than a single CPU core at a time:
$ reindexdb --help
reindexdb reindexes a PostgreSQL database.
Usage:
reindexdb [OPTION]... [DBNAME]
Options:
-a, --all reindex all databases
--concurrently reindex concurrently
-d, --dbname=DBNAME database to reindex
-e, --echo show the commands being sent to the server
-i, --index=INDEX recreate specific index(es) only
-j, --jobs=NUM use this many concurrent connections to reindex
...
The performance difference can be quite substantial as shown in the next listing:
$ time reindexdb -j 8 database_name
real 0m6.789s
user 0m0.012s
sys 0m0.008s
$ time reindexdb database_name
real 0m24.137s
user 0m0.001s
sys 0m0.004s
In this case, 8 cores have been used to speed up the process. Keep in mind that the -j option will only speed up things if you have enough indexes that can be created concurrently. If you have just a single table with a single index, no improvement will happen. The more large tables you have, the better the results will be.
Allowing hash aggregates to spill to disk
When running simple GROUP BY statements, PostgreSQL basically has two options to execute those types of query:
- Group aggregate
- Hash aggregate
A group aggregate is often used when the number of groups is really large. If you group billions of people by phone number, the number of groups is high and PostgreSQL cannot do it in memory. A group aggregate is a preferred method to execute this type of query. The second option is a hash aggregate. Suppose you group billions of people by gender. The number of resulting groups will be small because there is simply only a handful of genders. However, what if the planner gets it wrong? What if the planner produces a wrong estimate for the number of groups? Since PostgreSQL 13, it is possible to make a hash aggregate spill data to disk if more than work_mem is used up for a hash aggregate.
To see what happens, I will first run a simple GROUP BY statement:
test=# explain SELECT a, count(*) FROM tab GROUP BY 1;
QUERY PLAN
-------------------------------------------------------------------
GroupAggregate (cost=0.43..204852.43 rows=5000000 width=12)
Group Key: a
-> Index Only Scan using idx_a on tab (cost=0.43..129852.43 rows=5000000 width=4)
(3 rows)
PostgreSQL expects the number of groups to be 5 million rows and goes for a group aggregate. But what if we make indexing more expensive (random_page_cost) and tell PostgreSQL that there is really enough RAM to handle this operation in memory? Let's try it out:
test=# SET random_page_cost TO 100;
SET
test=# SET work_mem TO '10 GB';
SET
test=# explain SELECT a, count(*) FROM tab GROUP BY 1;
QUERY PLAN
-------------------------------------------------------------------
HashAggregate (cost=97124.00..147124.00 rows=5000000 width=12)
Group Key: a
-> Seq Scan on tab (cost=0.00..72124.00 rows=5000000 width=4)
(3 rows)
The planner will decide on a hash aggregate and execute it in memory as follows:
test=# explain analyze SELECT a, count(*) FROM tab GROUP BY 1;
QUERY PLAN
---------------------------------------------------------------------
HashAggregate (cost=97124.00..147124.00 rows=5000000 width=12)
(actual time=2466.159..3647.708 rows=5000000 loops=1)
Group Key: a
Peak Memory Usage: 589841 kB
-> Seq Scan on tab (cost=0.00..72124.00 rows=5000000 width=4)
(actual time=0.061..615.607 rows=5000000 loops=1)
Planning Time: 0.065 ms
Execution Time: 3868.609 ms
(6 rows)
As you can see, the system used around 600 MB of memory (at peak consumption) to run the operation. But what happens if we set work_mem to just below that value? Let's find out:
test=# SET work_mem TO '500 MB';
SET
test=# explain analyze SELECT a, count(*) FROM tab GROUP BY 1;
QUERY PLAN
-----------------------------------------------------------------------
HashAggregate (cost=97124.00..147124.00 rows=5000000 width=12)
(actual time=2207.005..3778.903 rows=5000000 loops=1)
Group Key: a
Peak Memory Usage: 516145 kB Disk Usage: 24320 kB
HashAgg Batches: 4
-> Seq Scan on tab (cost=0.00..72124.00 rows=5000000 width=4)
(actual time=0.054..545.678 rows=5000000 loops=1)
Planning Time: 0.092 ms
Execution Time: 3970.435 ms
(6 rows)
500 MB of work_mem is available but we use slightly more for our operation. PostgreSQL will, therefore, send the additional data to disk. If we reduce memory even further, PostgreSQL will fall back to a group aggregate. However, in the case of slightly wrong estimates, the new feature can really be helpful.
Speeding up PL/pgSQL
PL/pgSQL is the single most popular language to run stored procedures in PostgreSQL. However, some corners of PL/pgSQL used to be quite slow. PostgreSQL 13 features some performance improvements to help developers using PL/pgSQL to run faster code. Here is an example:
CREATE OR REPLACE FUNCTION slow_pi() RETURNS double precision AS $$
DECLARE
a double precision := 1;
s double precision := 1;
r double precision := 0;
BEGIN
FOR i IN 1 .. 10000000 LOOP
r := r + s/a;
a := a + 2;
s := -s;
END LOOP;
RETURN 4 * r;
END;
$$ LANGUAGE plpgsql;
SELECT slow_pi();
slow_pi
--------------------
3.1415925535898497
(1 row)
Time: 13060,650 ms (00:13,061) vs 2108,464 ms (00:02,108)
Our function is a super slow method to calculate pi. The interesting thing is that loops have been accelerated in a way that the code executes several times faster than before. The beauty is that the code itself does not have to be changed – things will simply run faster by default.
Parallelizing VACUUM operations
VACUUM is needed in every single PostgreSQL deployment. Since version 13, PostgreSQL can handle indexes on relations in parallel. The way to do it works as follows:
VACUUM (verbose ON, analyze ON, parallel 4) some_table;
Simply tell VACUUM how many cores you might want to use and PostgreSQL will try to use as many cores as specified by the end user.
Allowing skipping of WAL for full table writes
When wal_level is set to minimal (which is not the default value anymore) you will be able to enjoy a truly interesting performance improvement: if a relation (which is usually a synonym for a table or a materialized view) is larger than wal_skip_threshole, PostgreSQL tries to avoid writing WAL by syncing the relation directly rather than going through the transaction log machinery. In older versions, only COPY could do that. Depending on the properties of your storage, raising or lowering this value might help if committing a transaction down is slowing concurrent transactions to an unacceptable degree.
Keep in mind that wal_level = minimal is not possible if you want to use replication.
Additional performance improvements
In addition to the changes we have just seen, there are several more performance enhancements. The first thing worth mentioning is the ability to use ALTER STATISTICS ... SET STATISTICS. PostgreSQL now allows multiple extended statistics to be used in the same query, which can be quite useful if your operations are fairly complex.
To tell PostgreSQL how much I/O concurrency your storage system can handle, the maintenance_io_concurrency parameter has been added to postgresql.conf:
test=# SHOW maintenance_io_concurrency;
maintenance_io_concurrency
----------------------------
10
(1 row)
This parameter allows administrative tasks to act in a cleverer way.
If you happen to operate on really large tables you might like the next feature: large relations can now be truncated a lot faster. This can be really useful if you store data only for a limited period of time.
To store large fields (usually > 1,996 bytes), PostgreSQL uses a technique generally known as TOAST (The Oversized Attribute Storage Technique). PostgreSQL improves the performance of TOAST decompression and allows the retrieval of the leading bytes in a TOAST field more efficiently.
Traditionally, you had to use TCP/IP connections on Windows. The problem is that a TCP/IP connection suffers more from higher latency than a simple UNIX socket. Windows now supports socket connections, which help to reduce latency on local connections.
Finally, LISTEN / NOTIFY has been improved to allow better performance.
Making monitoring more powerful
Monitoring is key to every successful database operation. This area of PostgreSQL has been improved as well.
Additional system views
To make life easier for administrators and DevOps engineers, the PostgreSQL community has added a couple of easy-to-use system views to version 13.
One of my favorite new ones is a system view that allows us to track the progress with pg_basebackup:
test=# \d pg_stat_progress_basebackup
View "pg_catalog.pg_stat_progress_basebackup"
Column | Type | Collation | Nullable | Default
----------------------+---------+-----------+----------+---------
pid | integer | | |
phase | text | | |
backup_total | bigint | | |
backup_streamed | bigint | | |
tablespaces_total | bigint | | |
tablespaces_streamed | bigint | | |
Previously, it was hard to see how far a backup has progressed. The same used to be true for ANALYZE. On large tables, it could take quite a while, so one more view has been added:
test=# \d pg_stat_progress_analyze
View "pg_catalog.pg_stat_progress_analyze"
Column | Type | Collation | Nullable | Default
---------------------------+---------+-----------+----------+---------
pid | integer | | |
datid | oid | | |
datname | name | | |
relid | oid | | |
phase | text | | |
sample_blks_total | bigint | | |
sample_blks_scanned | bigint | | |
ext_stats_total | bigint | | |
ext_stats_computed | bigint | | |
child_tables_total | bigint | | |
child_tables_done | bigint | | |
current_child_table_relid | oid | | |
pg_stat_progress_analyze tells you about sampling and also provides you with statistics about partitioned tables and so on.
What is also important is that fields have been added to pg_stat_replication:
test=# \d pg_stat_replication
View "pg_catalog.pg_stat_replication"
Column | Type | Collation | Nullable | Default
------------------+-------------+-----------+----------+---------
pid | integer | | |
...
spill_txns | bigint | | |
spill_count | bigint | | |
spill_bytes | bigint | | |
If you are using logical decoding, it might happen that some transactions have to go to disk because there is not enough RAM available to handle the pending changes that are to be applied to replicas. pg_stat_replication can now track this information.
One more view that has been added to version 13 is pg_shmem_allocations. It allows users to keep track of allocated shared memory:
test=# \d pg_shmem_allocations
View "pg_catalog.pg_shmem_allocations"
Column | Type | Collation | Nullable | Default
----------------+--------+-----------+----------+---------
name | text | | |
off | bigint | | |
size | bigint | | |
allocated_size | bigint | | |
Basically, you can see what has allocated how much to what. This is especially useful if you are working with many different extensions and things allocating shared memory (that is, not local memory, such as work_mem and so on).
Finally, there is pg_stat_slru. The question now is: what is SLRU? In PostgreSQL, there is more at work than just indexes and tables. Internally, PostgreSQL has to cache things such as the CLOG (short for, Commit Log), Multixacts, and subtransaction data. This is what the SLRU cache is doing. If many different transactions are accessed, the commit log (not to be mixed up with the Write Ahead Log (WAL), which is also known as xlog) can be a major source of contention:
test=# \d pg_stat_slru
View "pg_catalog.pg_stat_slru"
Column | Type | Collation | Nullable | Default
--------------+--------------------------+-----------+----------+---------
name | text | | |
blks_zeroed | bigint | | |
blks_hit | bigint | | |
blks_read | bigint | | |
blks_written | bigint | | |
blks_exists | bigint | | |
flushes | bigint | | |
truncates | bigint | | |
stats_reset | timestamp with time zone | | |
All of these system views will make it easier to see what is going in inside PostgreSQL. We expect more views to be added in the future. We have already seen several patches for PostgreSQL 14. Laurenz Albe (at Cybertec PostgreSQL International, my company) has come up with a patch that helps to keep track of connection counts and a lot more.