Cross column correlation
Cross column correlation can cause a wrong estimation of the number of rows as PostgreSQL assumes that each column is independent of other columns. In reality, there are a lot of examples where this is not true. For example, one could find patterns where the first and last names in certain cultures are correlated. Another example is the country
and language
preference of the users. To understand cross column correlation, let's create a table called users
, as follows:
CREATE TABLE users ( id serial primary key, name text, country text, language text ); INSERT INTO users(name, country, language) SELECT generate_random_text(8), 'Germany', 'German' FROM generate_series(1, 10); INSERT INTO users(name, country, language) SELECT generate_random_text(8), 'USA', 'English' FROM generate_series(1, 10); VACUUM ANALYZE users;
If one wants to get users whose language is German
and country is Germany
, he/she will end up with a wrong estimation of rows as both columns are...