Clustering for efficiency
Most users stop optimizing the performance of a table after adding the appropriate indices. This usually happens because the performance reaches a point where it is good enough. But what if the table has millions or billions of records? This amount of information may not fit in the database server's RAM, thereby forcing hard drive access. Generally, table records are stored sequentially on the hard drive. But the data being fetched for a query from the hard drive may be accessing many different parts of the hard drive. Having to access different parts of a hard drive is a known performance limitation.
To mitigate hard drive performance issues, a database table can have its records reordered on the hard drive so that similar record data is stored next to or near each other. The reordering of a database table is known as clustering and is used with the CLUSTER
statement in PostgreSQL.
Getting ready
We will use the California schools (caschools
) and San Francisco boundaries...