Clustering for efficiency
Most users stop optimizing the performance of a table after adding the appropriate indexes. This usually happens because the performance becomes "good enough". But what if the table has millions or billions of records? This amount of information may not fit in the database server's RAM, thereby forcing hard drive access. Generally, table records are stored sequentially on the hard drive. But, the data being fetched from the hard drive for a query may be accessing many different parts of the hard drive. Having to access different parts of a hard drive is a known performance limitation.
To mitigate hard drive performance issues, a database table can have its records reordered on the hard drive so that similar record data are stored next to or near each other. The reordering of a database table is known as
clustering and is used with the CLUSTER
statement in PostgreSQL.
Getting ready
We will use the California schools (caschools
) and San Francisco boundaries (sfpoly
...