Hash generation
The standard practice for most PL/Proxy implementations is to partition, based on the hashtext
of the field you're splitting on, which allows splitting among a number of nodes fairly without knowing the distribution of the data set in advance. hashtext
is a PostgreSQL-provided internal function that takes in any text input and generates an integer as output with a hash code. If you AND
the result at the bit level, to only take the lowest few bits, this turns out to be a very quick and fair way to distribute load among multiple systems. For this to work, the number of partitions needs to be a power of 2 (2, 4, 8, 16, and so on) and then you Bitwise AND
against one less than that number. So for two partitions that's & 1
, for four partitions it's & 3
, and so on. Here's what the output from it looks like for the first ten integers hashed into four partitions:
SELECT s,hashtext(s::text) & 3 AS hash FROM generate_series(1,10) AS s; s | hash...