Avoiding auto-freezing and page corruptions
In the life cycle of a row, there are two routes that a row can take in PostgreSQL – a row version dies and needs to be removed by VACUUM
, or a row version gets old enough and needs to be frozen, a task that is also performed by the VACUUM
process. The removal of dead rows is easy to understand, while the second seems strange and surprising.
PostgreSQL uses internal transaction identifiers that are 4 bytes long, so we only have 232 transaction IDs (about four billion). PostgreSQL starts again from the beginning when that wraps around, circularly allocating new identifiers. The reason we do this is that moving to an 8-byte identifier has various other negative effects and costs that we would rather not pay for, so we keep the 4-byte transaction identifier. The impact is that we need to do regular sweeps of the entire database to mark tuples as frozen, meaning they are visible to all users – that's...