You're reading from PostgreSQL 14 Administration Cookbook Over 175 proven recipes for database administrators to manage enterprise databases effectively

Product type Paperback

Published in Mar 2022

Publisher Packt

ISBN-13 9781803248974

Length 608 pages

Edition 1st Edition

Tools

PostgreSQL

Concepts

Databases

Authors (2):

Gianni Ciolli

Simon Riggs

View More author details

Table of Contents (14) Chapters

Preface

1. Chapter 1: First Steps

2. Chapter 2: Exploring the Database FREE CHAPTER

3. Chapter 3: Server Configuration

4. Chapter 4: Server Control

5. Chapter 5: Tables and Data

6. Chapter 6: Security

7. Chapter 7: Database Administration

8. Chapter 8: Monitoring and Diagnosis

9. Chapter 9: Regular Maintenance

10. Chapter 10: Performance and Concurrency

11. Chapter 11: Backup and Recovery

12. Chapter 12: Replication and Upgrades

13. Other Books You May Enjoy

Quickly estimating the number of rows in a table

We don't always need an accurate count of rows, especially on a large table that may take a long time to execute. Administrators often need to estimate how big a table is so that they can estimate how long other operations may take.

How to do it…

The Postgres optimizer can provide a quick estimate of the number of rows in a table simply by using its statistics:

EXPLAIN SELECT * FROM mytable;
                          QUERY PLAN                            
----------------------------------------------------------------
 Seq Scan on mytable  (cost=0.00..2640.00 rows=100000 width=97)
(1 row)

We can directly compute a similar number using roughly the same calculation:

SELECT (CASE WHEN reltuples > 0 THEN pg_relation_size(oid)*reltuples/(8192*relpages) 
ELSE 0
END)::bigint AS estimated_row_count
FROM pg_class
WHERE oid = 'mytable'::regclass;

This gives us the following output:

 estimated_row_count
---------------------
               99960
(1 row)

Both queries return a row count very quickly, no matter how large the table that we are examining is, because they use statistics that were collected in advance. You may want to create a SQL function for the preceding calculation, so you won't need to retype the SQL code every now and then.

The following function estimates the total number of rows using a mathematical procedure called extrapolation. In other words, we take the average number of bytes per row resulting from the last statistics collection, and we apply it to the current table size:

CREATE OR REPLACE FUNCTION estimated_row_count(text) 
RETURNS bigint 
LANGUAGE sql 
AS $$ 
SELECT (CASE WHEN reltuples > 0 THEN 
              pg_relation_size($1)*reltuples/(8192*relpages) 
             ELSE 0 
             END)::bigint 
FROM pg_class 
WHERE oid = $1::regclass; 
$$;

How it works…

We saw the pg_relation_size() function earlier, so we know that it brings back an accurate value for the current size of the table.

When we vacuum a table in Postgres, we record two pieces of information in the pg_class catalog entry for the table. These two items are the number of data blocks in the table (relpages) and the number of rows in the table (reltuples). Some people think they can use the value of reltuples in pg_class as an estimate, but it could be severely out of date. You will also be fooled if you use information in another table named pg_stat_user_tables, which is discussed in more detail in Chapter 10, Performance and Concurrency.

The Postgres optimizer uses the relpages and reltuples values to calculate the average rows per block, which is also known as the average tuple density.

If we assume that the average tuple density remains constant over time, then we can calculate the number of rows using this formula: Row estimate = number of data blocks * rows per block.

We include some code to handle cases where the reltuples or relpages fields are zero. The Postgres optimizer actually works a little harder than we do in that case, so our estimate isn't very good.

The WHERE oid = 'mytable'::regclass; syntax introduces the concept of object identifier types. They just use a shorthand trick to convert the name of an object to the object identifier number for that object. The best way to understand this is to think of that syntax as meaning the same as a function named relname2relid().

There's more…

The good thing about the preceding recipe is that it returns a value in about the same time, no matter how big the table is. The bad thing about it is that pg_relation_size() requests a lock on the table, so if any other user has an AccessExclusiveLock lock on the table, then the table size estimate will wait for the lock to be released before returning a value.

Err... so what is an AccessExclusiveLock lock? While performing a SQL maintenance action, such as changing the data type of a column, PostgreSQL will lock out all other actions on that table, including pg_relation_size, which takes a lock in the AccessShareLock mode. For me, a typical case is when I issue some form of SQL maintenance action, such as ALTER TABLE, and the statement takes much longer than I thought it would. At that point, I think, Oh, was that table bigger than I thought? How long will I be waiting? Yes, it's better to calculate that beforehand, but hindsight doesn't get you out of the hole you are in right now. So, we need a way to calculate the size of a table without needing the lock.

A solution is to look at the operating system files that Postgres uses to store data, and figure out how large they are, but that requires a high level of security than most people usually allow. In any case, looking at files without a lock could cause problems if the table were dropped or changed.