Understanding basic locking
In this section, you will learn about basic locking mechanisms. The goal is to understand how locking works in general and how to get simple applications right.
To show you how things work, we will create a simple table. For demonstrative purposes, I will add one row to the table using a simple INSERT
command:
test=# CREATE TABLE t_test (id int); CREATE TABLE test=# INSERT INTO t_test VALUES (0); INSERT 0 1
The first important thing is that tables can be read concurrently. Many users reading the same data at the same time won’t block each other. This allows PostgreSQL to handle thousands of users without any problems.
The question now is what happens if reads and writes occur at the same time? Here is an example. Let’s assume that the table contains one row and its id =
0
:
Transaction 1 |
Transaction 2 |
|
|
|
|
User will see |
|
User will see |
|
|
|
Table 2.1 – Transaction isolation
Two transactions are opened. The first one will change a row. However, this is not a problem, as the second transaction can proceed. It will return to the old row as it was before UPDATE
. This behavior is called Multi-Version Concurrency Control (MVCC).
Note
A transaction will only see data if it has been committed by the write transaction before the initiation of the read transaction. One transaction cannot inspect the changes that have been made by another active connection. A transaction can see only those changes that have already been committed.
There is also a second important aspect – many commercial or open source databases are still unable to handle concurrent reads and writes. In PostgreSQL, this is absolutely not a problem – reads and writes can coexist.
Note
Write transactions won’t block read transactions.
After the transaction has been committed, the table will contain 1
. What will happen if two people change data at the same time? Here is an example:
Transaction 1 |
Transaction 2 |
|
|
|
|
It will return |
|
It will wait for transaction |
|
|
It will wait for transaction |
It will reread the row, find |
|
|
Table 2.2 – Handling concurrent updates
Suppose you want to count the number of hits on a website. If you run the preceding code, no hits will be lost because PostgreSQL guarantees that one UPDATE
statement is performed after another.
Note
PostgreSQL will only lock rows affected by UPDATE
. So, if you have 1,000 rows, you can theoretically run 1,000 concurrent changes on the same table.
It is also worth noting that you can always run concurrent reads. Our two writes will not block reads.
Avoiding typical mistakes and explicit locking
In my life as a professional PostgreSQL consultant (https://www.cybertec-postgresql.com), I have seen a couple of mistakes that are repeated frequently. If there are constants in life, these typical mistakes are definitely among them.
Here is my favorite:
Transaction 1 |
Transaction 2 |
|
|
|
|
The user will see |
The user will see |
The user will decide to use |
The user will decide to use |
|
|
|
|
Table 2.3 – Potential locking related problems
In this case, there will be either a duplicate key violation or two identical entries. Neither variation of the problem is all that appealing.
One way to fix this problem is to use explicit table locking. The following code shows us the syntax definition of LOCK
:
test=# \h LOCK Command: LOCK Description: lock a table Syntax: LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ] where lockmode is one of: ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE | SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE URL: https://www.postgresql.org/docs/15/sql-lock.html
As you can see, PostgreSQL offers eight types of locks to lock an entire table. In PostgreSQL, a lock can be as light as an ACCESS SHARE
lock or as heavy as an ACCESS EXCLUSIVE
lock. The following list shows what these locks do:
ACCESS SHARE
: This type of lock is taken by reads and conflicts only withACCESS EXCLUSIVE
, which is set byDROP TABLE
and so on. Practically, this means thatSELECT
cannot start if a table is about to be dropped. This also implies thatDROP TABLE
has to wait until a reading transaction is complete.ROW SHARE
: PostgreSQL takes this kind of lock in the case ofSELECT FOR UPDATE
/SELECT FOR SHARE
. It conflicts withEXCLUSIVE
andACCESS EXCLUSIVE
.ROW EXCLUSIVE
: This lock is taken byINSERT
,UPDATE
, andDELETE
. It conflicts withSHARE
,SHARE ROW EXCLUSIVE
,EXCLUSIVE
, andACCESS EXCLUSIVE
.SHARE UPDATE EXCLUSIVE
: This kind of lock is taken byCREATE INDEX CONCURRENTLY
,ANALYZE
,ALTER TABLE
,VALIDATE
, and some other flavors ofALTER TABLE
, as well as byVACUUM
(notVACUUM FULL
). It conflicts with theSHARE UPDATE EXCLUSIVE
,SHARE
,SHARE ROW EXCLUSIVE
,EXCLUSIVE
, andACCESS EXCLUSIVE
lock modes.SHARE
: When an index is created,SHARE
locks will be set. It conflicts withROW EXCLUSIVE
,SHARE UPDATE EXCLUSIVE
,SHARE ROW EXCLUSIVE
,EXCLUSIVE
, andACCESS EXCLUSIVE
.SHARE ROW EXCLUSIVE
: This one is set byCREATE TRIGGER
and some forms ofALTER TABLE
and conflicts with everything exceptACCESS SHARE
.EXCLUSIVE
: This type of lock is by far the most restrictive one. It protects against reads and writes alike. If this lock is taken by a transaction, nobody else can read or write to the table that’s been affected.ACCESS EXCLUSIVE
: This lock prevents concurrent transactions from reading and writing.
Given the PostgreSQL locking infrastructure, one solution to the max
problem we outlined previously would be as follows. The example in the following code shows how to lock a table:
BEGIN; LOCK TABLE product IN ACCESS EXCLUSIVE MODE; INSERT INTO product SELECT max(id) + 1, ... FROM product; COMMIT;
Keep in mind that this is a pretty nasty way of doing this kind of operation because nobody else can read or write to the table during your operation. Therefore, ACCESS EXCLUSIVE
should be avoided at all costs.
Checking for locks
Checking for locks is not a trivial matter. There are various options. The first one is to see whether a lock is causing an issue at all:
test=# SELECT pid, wait_event_type, wait_event, query FROM pg_stat_activity WHERE datname = 'test'; ... -[ RECORD 3 ]---+--------------------------------------------------------- pid | 23068 wait_event_type | Client wait_event | ClientRead query | lock table t_test in access exclusive mode ; -[ RECORD 4 ]---+--------------------------------------------------------- pid | 23071 wait_event_type | Lock wait_event | relation query | SELECT count(*) FROM t_test;
What we can see here is the query causing the lock, as well as the query waiting on the lock (as shown in the wait
event).
Considering alternative solutions
There is an alternative solution to this problem. Consider an example where you are asked to write an application to generate invoice numbers. The tax office might require you to create invoice numbers without gaps and without duplicates. How would you do this? Of course, one solution would be a table lock. However, you can really do better. Here is what you can do to handle the numbering problem we are trying to solve:
test=# CREATE TABLE t_invoice (id int PRIMARY KEY); CREATE TABLE test=# CREATE TABLE t_watermark (id int); CREATE TABLE test=# INSERT INTO t_watermark VALUES (0); INSERT 0 test=# WITH x AS (UPDATE t_watermark SET id = id + 1 RETURNING *) INSERT INTO t_invoice SELECT * FROM x RETURNING *; id ---- 1 (1 row)
In this case, we introduced a table called t_watermark
. It contains just one row. The WITH
command will be executed first. The row will be locked and incremented, and the new value will be returned. Only one person can do this at a time. The value returned by the CTE is then used in the invoice table. It is guaranteed to be unique. The beauty is that there is only a simple row lock on the watermark table, which leads to no reads being blocked in the invoice table. Overall, this way is more scalable.