Data distribution
One of the key features of Cassandra is auto-sharding. Data is distributed among nodes in a cluster based on partition keys automatically. A partition key is a column or multiple columns, which are part of a primary key of a column family. Data is distributed based on the tokenized value calculated over the partition key. A partitioner determines how distribution tokens are calculated. Each node of Cassandra cluster its owns a range of tokens. A row is stored on the node that owns the respective token of the row's partition key.
A partitioner can be set using the configuration option partitioner in cassandra.yaml
. The new cluster should go with Murmur3Partitioner
, as it is a faster partitioner than older ones and also distributes data more efficiently. Other partitioners for backward compatibility are RandomPartitioner
, ByteOrderedPartitioner
, and OrderPreservingPartitioner
.
Here is a brief description of all the listed partitioners:
Murmur3Partitioner
: This is the default...