Difficult areas to estimate
If no data has ever been collected about a table using ANALYZE
, the planner has to make up completely arbitrary estimates. This is the worst sort of statistics issue to have.
Another area where statistics can fail is situations where the query optimizer just doesn't know how to estimate something. You can watch it utterly fail to estimate this simple query correctly:
EXPLAIN ANALYZE SELECT * FROM customers WHERE customerid=customerid; QUERY PLAN ---------- Seq Scan on customers (cost=0.00..726.00 rows=100 width=268) (actual time=0.012..30.012 rows=20000 loops=1) Filter: (customerid = customerid) Total runtime: 55.549 ms
It's obvious to a person that all 20,000 rows will be returned. What's happening is that because it has no idea how selective the test here is, the optimizer has to make a wild guess. The standard estimate it uses in this situation is 0.5% of the rows, which is where the figure of 100 expected rows comes from.
When...