Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
PostgreSQL 16 Administration Cookbook

You're reading from   PostgreSQL 16 Administration Cookbook Solve real-world Database Administration challenges with 180+ practical recipes and best practices

Arrow left icon
Product type Paperback
Published in Dec 2023
Publisher Packt
ISBN-13 9781835460580
Length 636 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (5):
Arrow left icon
Boriss Mejías Boriss Mejías
Author Profile Icon Boriss Mejías
Boriss Mejías
Jimmy Angelakos Jimmy Angelakos
Author Profile Icon Jimmy Angelakos
Jimmy Angelakos
Simon Riggs Simon Riggs
Author Profile Icon Simon Riggs
Simon Riggs
Gianni Ciolli Gianni Ciolli
Author Profile Icon Gianni Ciolli
Gianni Ciolli
Vibhor Kumar Vibhor Kumar
Author Profile Icon Vibhor Kumar
Vibhor Kumar
+1 more Show less
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. First Steps 2. Exploring the Database FREE CHAPTER 3. Server Configuration 4. Server Control 5. Tables and Data 6. Security 7. Database Administration 8. Monitoring and Diagnosis 9. Regular Maintenance 10. Performance and Concurrency 11. Backup and Recovery 12. Replication and Upgrades 13. Other Books You May Enjoy
14. Index

Forcing a query to use an index

Often, we think we know better than the database optimizer. Most of the time, your expectations are wrong, and if you look carefully, you’ll see that. So, recheck everything and come back later.

It is a classic error to try to get the database optimizer to use indexes when the database has very little data in it. Put some genuine data in the database first, then worry about it. Better yet, load some data on a test server first, rather than doing this in production.

Sometimes, the optimizer gets it wrong. You feel elated—and possibly angry—that the database optimizer doesn’t see what you see. Please bear in mind that the data distributions within your database change over time, and this causes the optimizer to change its plans over time as well.

If you have found a case where the optimizer is wrong, this can sometimes change over time as the data changes. It might have been correct last week and will be correct again next week, or it correctly calculated that a change of plan was required, but it made that change slightly ahead of time or slightly too late. Again, trying to force the optimizer to do the right thing now might prevent it from doing the right thing later, when the plan changes again. So hinting fixes things in the short term, but in the longer term can cause problems to resurface.

In the long run, it is not recommended to try to force the use of a particular index.

Getting ready

Still here? Oh well.

If you really feel this is necessary, then your starting point is to run an EXPLAIN command for your query, so please read the previous recipe first.

How to do it…

The most common problem is selecting too much data.

A typical point of confusion comes from data that has a few very common values among a larger group. Requesting data for very common values costs more because we need to bring back more rows. As we bring back more rows, the cost of using the index increases. Therefore, it is possible that we won’t use the index for very common values, whereas we would use the index for less common values. To use an index effectively, make sure you’re using the LIMIT clause to reduce the number of rows that are returned.

Since different index values might return more or less data, it is common for execution times to vary depending on the exact input parameters. This could cause a problem if we are using prepared statements—the first five executions of a prepared statement are made using “custom plans” that vary according to the exact input parameters. From the sixth execution onward, the optimizer decides whether to use a “generic plan” or not, if it thinks the cost will be lower on average. Custom plans are more accurate, but the planning overhead makes them less efficient than generic plans. This heuristic can go wrong at times and you might need to override it using plan_cache_mode = force_generic_plan or force_custom_plan.

Another technique for making indexes more usable is partial indexes. Instead of indexing all of the values in a column, you might choose to index only a set of rows that are frequently accessed—for example, by excluding NULL or other unwanted data. By making the index smaller, it will be cheaper to access and will fit within the cache better, preventing pointless work by targeting the index at only the important data. Data statistics are kept for such indexes, so it can also improve the accuracy of query planning. Let’s look at an example:

CREATE INDEX ON customer(id)
WHERE blocked = false AND subscription_status = 'paid';

Another common problem is that the optimizer may make errors in its estimation of the number of rows returned, causing the plan to be incorrect. Some optimizer estimation errors can be corrected using CREATE STATISTICS. If the optimizer is making errors, it can be because the WHERE clause contains multiple columns. For example, queries that mention related columns such as state and phone_area_code or city and zip_code will have poor estimates because those pairs of columns have data values that are correlated.

You can define additional statistics that will be collected when you next analyze the table:

CREATE STATISTICS cust_stat1 ON state, area_code FROM cust;

The execution time of ANALYZE will increase to collect the additional stats information, plus there is a small increase in query planning time, so use this sparingly when you can confirm this will make a difference. If there is no benefit, use DROP STATISTICS to remove them again. By default, multiple types of statistics will be collected—you can fine-tune this by specifying just a few types of statistics if you know what you are doing.

Unfortunately, the statistics command doesn’t automatically generate names, so include the table name in the statistics you create since the name is unique within the database and cannot be repeated on different tables. In future releases, we may also add cross-table statistics.

Additionally, you cannot collect statistics on individual fields within JSON documents at the moment, nor collect dependency information between them; this command only applies to whole column values at this time.

Another nudge toward using indexes is to set random_page_cost to a lower value—maybe even equal to seq_page_cost. This makes PostgreSQL prefer index scans on more occasions, but it still does not produce entirely unreasonable plans, at least for cases where data is mostly cached in shared buffers or system disk caches, or underlying disks are solid-state drives (SSDs).

The default values for these parameters are provided here:

random_page_cost = 4;
seq_page_cost = 1;

Try setting this:

set random_page_cost = 2;

See if it helps; if not, you can try setting it to 1.

Changing random_page_cost allows you to react to whether data is on disk or in memory. Letting the optimizer know that more of an index is in the cache will help it to understand that using the index is actually cheaper.

Index scan performance for larger scans can also be improved by allowing multiple asynchronous I/O operations by increasing effective_io_concurrency. Both random_page_cost and effective_io_concurrency can be set for specific tablespaces or for individual queries.

There’s more…

PostgreSQL does not directly support hints, but they are available via an extension.

If you absolutely, positively have to use the index, then you’ll want to know about an extension called pg_hint_plan. It is available for PostgreSQL 9.1 and later versions. For more information and to download it, go to http://pghintplan.sourceforge.jp/. Hints can be added to your application SQL using a special comment added to the start of a query, like this:

/*+ IndexScan(tablename indexname) */ SELECT

It works, but, as I said previously, try to avoid fixing things now and causing yourself pain later when the data distribution changes.

EnterpriseDB (EDB) and Postgres Advanced Server (EPAS) also support hints in an Oracle-style syntax to allow you to select a specific index, like this:

SELECT /*+ INDEX(tablename indexname) */ … rest of query …

EPAS has many compatibility features such as this for migrating application logic from Oracle. See https://www.enterprisedb.com/docs/epas/latest/epas_compat_ora_dev_guide/05_optimizer_hints/ for more information on this.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image