Optimizing storage and managing cleanup
Transactions are an integral part of the PostgreSQL system. However, transactions come with a small price tag attached. As we’ve already shown in this chapter, sometimes, concurrent users will be presented with different data. Not everybody will get the same data returned by a query. In addition to this, DELETE
and UPDATE
are not allowed to actually overwrite data, since ROLLBACK
would not work. If you happen to be in the middle of a large DELETE
operation, you cannot be sure whether you will be able to use COMMIT
or not. In addition to this, data is still visible while you perform the DELETE
operation, and sometimes, data is even visible once your modification has long since finished.
Consequently, this means that cleanup has to happen asynchronously. A transaction cannot clean up its own mess, and any COMMIT
/ROLLBACK
might be too early to take care of dead rows.
The solution to this problem is VACUUM
. The following code block provides you with a syntax overview:
test=# \h VACUUM Command: VACUUM Description: garbage-collect and optionally analyze a database Syntax: VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ] VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ table_and_columns [, ...] ] where option can be one of: FULL [ boolean ] FREEZE [ boolean ] VERBOSE [ boolean ] ANALYZE [ boolean ] DISABLE_PAGE_SKIPPING [ boolean ] SKIP_LOCKED [ boolean ] INDEX_CLEANUP { AUTO | ON | OFF } PROCESS_TOAST [ boolean ] TRUNCATE [ boolean ] PARALLEL integer and table_and_columns is: table_name [ ( column_name [, ...] ) ] URL: https://www.postgresql.org/docs/15/sql-vacuum.html
VACUUM
will visit all of the pages that potentially contain modifications and find all the dead space. The free space that’s found is then tracked by the free space map (FSM) of the relation.
Note that VACUUM
will, in most cases, not shrink the size of a table. Instead, it will track and find free space inside existing storage files.
Note
Tables will usually have the same size after VACUUM
. If there are no valid rows at the end of a table, file sizes can go down, although this is rare. This is not the rule, but rather the exception.
What this means to end users will be outlined in the Watching VACUUM at work subsection of this chapter.
Configuring VACUUM and autovacuum
Back in the early days of PostgreSQL projects, people had to run VACUUM
manually. Fortunately, those days are long gone. Nowadays, administrators can rely on a tool called autovacuum
, which is part of the PostgreSQL server infrastructure. It automatically takes care of cleanup and works in the background. It wakes up once per minute (see autovacuum_naptime = 1
in postgresql.conf
) and checks whether there is work to do. If there is work, autovacuum
will fork up to three worker processes (see autovacuum_max_workers
in postgresql.conf
).
The main question is, when does autovacuum
trigger the creation of a worker process?
Note
Actually, the autovacuum
process doesn’t fork processes itself. Instead, it tells the main process to do so. This is done to avoid zombie processes in the event of failure and to improve robustness.
The answer to this question can, again, be found in postgresql.conf
, as shown in the following code:
autovacuum_vacuum_threshold = 50 autovacuum_analyze_threshold = 50 autovacuum_vacuum_scale_factor = 0.2 autovacuum_analyze_scale_factor = 0.1 autovacuum_vacuum_insert_threshold = 1000
The autovacuum_vacuum_scale_factor
command tells PostgreSQL that a table is worth vacuuming if 20% of its data has been changed. The trouble is that if a table consists of one row, one change is already 100%. It makes absolutely no sense to fork a complete process to clean up just one row. Therefore, autovacuum_vacuuum_threshold
says that we need 20%, and this 20% must be at least 50 rows. Otherwise, VACUUM
won’t kick in. The same mechanism is used when it comes to the creation of optimizer statistics. We need 10% and at least 50 rows to justify new optimizer statistics. Ideally, autovacuum
creates new statistics during a normal VACUUM
to avoid unnecessary trips to the table.
However, there is more – in the past, autovacuum
was not triggered by workloads consisting of only INSERT
statements, which could be a major issue. The new autovacuum_vacuum_insert_threshold
parameter was added to fix exactly this kind of problem. From PostgreSQL 13 onward, autovacuum
activity will happen even if only INSERT
statements are occuring in the database.
Digging into transaction wraparound-related issues
There are two more settings in postgresql.conf
that are quite important to understand to really make use of PostgreSQL. As we have stated already, understanding VACUUM
is key to performance:
autovacuum_freeze_max_age = 200000000 autovacuum_multixact_freeze_max_age = 400000000
To understand the overall problem, it is important to understand how PostgreSQL handles concurrency. The PostgreSQL transaction machinery is based on the comparison of transaction IDs and the states that transactions are in.
Let’s look at an example. If I am transaction ID 4711
and you happen to be 4712
, I won’t see you because you are still running. If I am transaction ID 4711
but you are transaction ID 3900
, I will see you. If your transaction has failed, I can safely ignore all of the rows that are produced by your failing transaction.
The trouble is as follows – transaction IDs are finite, not unlimited. At some point, they will start to wrap around. In reality, this means that transaction number 5
might actually be after transaction number 800
million. How does PostgreSQL know what was first? It does so by storing a watermark. At some point, those watermarks will be adjusted, and this is exactly when VACUUM
starts to be relevant. By running VACUUM
(or autovacuum
), you can ensure that the watermark is adjusted in such a way that there are always enough future transaction IDs left to work with.
Note
Not every transaction will increase the transaction ID counter. As long as a transaction is still reading, it will only have a virtual transaction ID. This ensures that transaction IDs are not burned too quickly.
The autovacuum_freeze_max_age
command defines the maximum number of transactions (age) that a table’s pg_class.relfrozenxid
field can attain before a VACUUM
operation is forced to prevent transaction ID wraparound within the table. This value is fairly low because it also has an impact on clog cleanup (the clog or commit log is a data structure that stores two bits per transaction, which indicates whether a transaction is running, aborted, committed, or still in a subtransaction).
The autovacuum_multixact_freeze_max_age
command configures the maximum age that a table’s pg_class.relminmxid
field can attain before a VACUUM
operation is forced to prevent the multixact
ID wraparound within the table. Freezing tuples is an important performance issue, and there will be more on this process in Chapter 6, Optimizing Queries for Good Performance, where we will discuss query optimization.
In general, trying to reduce the VACUUM
load while maintaining operational security is a good idea. A VACUUM
operation on large tables can be expensive, and therefore, keeping an eye on these settings makes perfect sense.
A word on VACUUM FULL
Instead of the normal VACUUM
, you can also use VACUUM FULL
. However, I really want to point out that VACUUM FULL
actually locks the table and rewrites the entire relationship. In the case of a small table, this might not be an issue. However, if your tables are large, the table lock can really kill you in minutes! VACUUM FULL
blocks upcoming writes, and therefore, some people talking to your database might get the impression that it is actually down. Hence, a lot of caution is advised.
Note
To get rid of VACUUM FULL
, I recommend that you check out pg_squeeze
(https://www.cybertec-postgresql.com/en/introducing-pg_squeeze-a-postgresql-extension-to-auto-rebuild-bloated-tables/), which can rewrite a table without blocking writes.
Watching VACUUM at work
Now, it is time to see VACUUM
in action. I have included this section here because my practical work as a PostgreSQL consultant and supporter (http://www.postgresql-support.com/) indicates that most people only have a very vague understanding of what happens on the storage side.
To stress this point again, in most cases, VACUUM
will not shrink your tables; space is usually not returned to the filesystem.
Here is my example, which shows you how to create a small table with customized autovacuum
settings. The table is filled with 100000
rows:
CREATE TABLE t_test (id int) WITH (autovacuum_enabled = off); INSERT INTO t_test SELECT * FROM generate_series(1, 100000);
The idea is to create a simple table containing 100000
rows. Note that it is possible to turn autovacuum
off for specific tables. Usually, this is not a good idea for most applications. However, there is a corner case, where autovacuum_enabled = off
makes sense. Just consider a table whose life cycle is very short. It doesn’t make sense to clean out tuples if the developer already knows that the entire table will be dropped within seconds. In data warehousing, this can be the case if you use tables as staging areas. VACUUM
is turned off in this example to ensure that nothing happens in the background. Everything you see is triggered by me and not by some process.
First of all, consider checking the size of the table by using the following command:
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 3544 kB (1 row)
The pg_relation_size
command returns the size of a table in bytes. The pg_size_pretty
command will take this number and turn it into something human-readable.
Then, all of the rows in the table will be updated using a simple UPDATE
statement, as shown in the following code:
test=# UPDATE t_test SET id = id + 1; UPDATE 100000
What happens is highly important to understand PostgreSQL. The database engine has to copy all the rows. Why? First of all, we don’t know whether the transaction will be successful, so the data cannot be overwritten. The second important aspect is that a concurrent transaction might still be seeing the old version of the data.
The UPDATE
operation will copy rows.
Logically, the size of the table will be larger after the change has been made:
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 7080 kB (1 row)
After UPDATE
, people might try to return space to the filesystem:
test=# VACUUM t_test; VACUUM
As we stated previously, VACUUM
does not return space to the filesystem in most cases. Instead, it will allow space to be reused. The table, therefore, doesn’t shrink at all:
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 7080 kB (1 row)
However, the next UPDATE
will not make the table grow because it will eat the free space inside the table. Only a second UPDATE
would make the table grow again because all the space is gone and so additional storage is needed:
test=# UPDATE t_test SET id = id + 1; UPDATE 100000 test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 7080 kB (1 row) test=# UPDATE t_test SET id = id + 1; UPDATE 100000 test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 10 MB (1 row)
If I had to decide on a single thing you should remember after reading this book, this is it. Understanding storage is the key to performance and administration in general.
Let’s run some more queries:
VACUUM t_test; UPDATE t_test SET id = id + 1; VACUUM t_test;
Again, the size is unchanged. Let’s see what’s inside the table:
test=# SELECT ctid, * FROM t_test ORDER BY ctid DESC; ctid | id -----------+-------- ... (1327, 46) | 112 (1327, 45) | 111 (1327, 44) | 110 ... (884, 20) | 99798 (884, 19) | 99797 ...
The ctid
command is the physical position of a row on a disk. By using ORDER BY ctid DESC
, you will basically read the table backward in physical order. Why should you care? Because there are some very small values and some very big values at the end of the table. The following code shows how the size of the table changes when data is deleted:
test=# DELETE FROM t_test WHERE id > 99000 OR id < 1000; DELETE 1999 test=# VACUUM t_test; VACUUM test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 3504 kB (1 row)
Although only 2% of the data has been deleted, the size of the table has gone down by two-thirds. The reason for this is that if VACUUM
only finds dead rows after a certain position in the table, it can return space to the filesystem. This is the only case in which you will actually see the table size go down. Of course, normal users have no control over the physical position of data on the disk. Therefore, storage consumption will most likely stay somewhat the same unless all rows are deleted.
Important note
Why are there so many small and big values at the end of the table anyway? After the table is initially populated with 100,000 rows, the last block is not completely full, so the first UPDATE
will fill up the last block with changes. This shuffles the end of the table a bit. In this carefully crafted example, this is the reason for the strange layout at the end of the table.
In real-world applications, the impact of this observation cannot be stressed enough. There is no performance tuning without really understanding storage.
Limiting transactions by making use of snapshot too old
VACUUM
does a good job, and it will reclaim free space as needed. However, when can VACUUM
actually clean out rows and turn them into free space? The rule is this – if a row cannot be seen by anybody anymore, it can be reclaimed. In reality, this means that everything that is no longer seen, even by the oldest active transaction, can be considered to be really dead.
This also implies that really long transactions can postpone cleanup for quite some time. The logical consequence is table bloat. Tables will grow beyond proportion, and performance will tend to go downhill. Fortunately, starting with PostgreSQL 9.6, the database has a nice feature that allows an administrator to intelligently limit the duration of a transaction. Oracle administrators will be familiar with the snapshot too old error. Since PostgreSQL 9.6, this error message is also available. However, it is more of a feature than an unintended side effect of bad configuration (which it actually is in Oracle).
To limit the lifetime of snapshots, you can make use of a setting in PostgreSQL’s config
file, postgresql.conf
, which has all of the configuration parameters that are needed for this:
old_snapshot_threshold = -1 # 1min-60d; -1 disables; 0 is immediate
If this variable is set, transactions will fail after a certain amount of time. Note that this setting is on an instance level and cannot be set inside a session. By limiting the age of a transaction, the risk of insanely long transactions will decrease drastically.
Making use of more VACUUM features
VACUUM
has steadily improved over the years. In this section, you will learn about some of the more recent improvements.
In many cases, VACUUM
can skip pages. This is especially true when the visibility map suggests that a block is visible to everyone. VACUUM
may also skip a page that is heavily used by some other transaction. DISABLE_PAGE_SKIPPING
disables this kind of behavior and ensures that all pages are cleaned during this run.
One more way to improve on VACUUM
is to use SKIP_LOCKED
; the idea here is to make sure that VACUUM
does not harm concurrency. If SKIP_LOCKED
is used, VACUUM
will automatically skip over relations, which cannot instantly be locked, thus avoiding conflict resolution. This kind of feature can be very useful in the event of heavy concurrency. One of the important and sometimes overlooked aspects of VACUUM
is the need to clean up indexes. After VACUUM
has successfully processed a heap, indexes are taken care of. If you want to prevent this from happening, you can make use of INDEX_CLEANUP
. By default, INDEX_CLEANUP
is true
, but depending on your workload, you might decide to skip index cleanup in some rare cases. So, what are those rare cases? Why might anybody not want to clean up indexes? The answer is simple – if your database may potentially soon shut down due to transaction wraparound, it makes sense to run VACUUM
as quickly as possible. If you’ve got a choice between downtime and some kind of postponed cleanup, you should opt for VACUUM
quickly to keep your database alive.
In recent versions of PostgreSQL, the PROCESS_TOAST
option has been added. The idea is to give users a chance to skip TOAST
altogether. In real life, there are not too many cases when this is actually desirable.
The next option that has been added is TRUNCATE
, either true
or false
. As we have mentioned already, it can happen that VACUUM
cuts off a file at the end if only dead tuples are found. This behavior can now be controlled. The default value is, of course, true
. However, you can disable file truncation if it is necessary.
VACUUM
used to work using a single CPU core for a long time. However, those times have changed for good. It is now possible for VACUUM
to make full use of your system and utilize more than one CPU core. Usually, the number of cores is determined by the size of the table, similar to how it is with normal SELECT
statements. By using the PARALLEL
option, you can decide how many cores VACUUM
is supposed to use.