- Cluster configurations
cluster_name
: This is the identification string for a logical cluster. All nodes in a cluster must have the same value for this configuration.
Default value: The default value is Test Cluster.
listen_address
: The Cassandra node will bind to this address. The other nodes in the cluster can communicate with this node if it is set correctly; leaving it to default will cause a failure in this node's communication with other nodes as default value is loopback address localhost
hence node will not be able to communicate with other nodes running on different machines.
Default value: The default value is localhost.
seed_provider
: The seed node helps Cassandra nodes to learn about other nodes in the cluster and ring topology using Gossip
protocol. We'll learn more about Gossip protocol in later chapters. It has two suboptions, one is class_name
and the other is number of seeds. The default seeding class takes a comma-delimited list of node addresses. In a multinode cluster, the seed list should have at least one node. This list should be common for all nodes.
Default value: The default value is -class_name:org.apache.cassandra.locator.SimpleSeedProvider-seeds: "127.0.0.1".
Tip
The seed list should have more than one node for fault tolerance of the bootstrapping process.
In a multi-data center cluster, at least one node from each data center should participate as a seed node.
Note
A node cannot be a seed node if it is a bootstrapping node. So, during the bootstrapping process, the node shouldn't be in the seeds list.
- Data partitioning
num_tokens
: This configuration defines the number of random tokens this node will hold, hence defining the partitioning ranges that this node can hold. This is a relative configuration. For example, if a node has num_tokens
as 128
while another node has 256
, then it means that the second node is handling twice the data partition ranges than the first node is handling.
Default value: The default value is 256
.
Tip
All nodes with the same hardware capability should have the same number of tokens configured.
partitioner
: This defines the data partition algorithm used in the Cassandra cluster. The current default algorithm—Murmur3— is very fast and is considered as a good data partition algorithm as compared to its predecessors. So, while forming a new cluster, you should go with the default value, which is org.apache.cassandra.dht.Murmur3Partitioner
.
Note
This setting shouldn't be changed once the data is loaded, as changing this will wipe all data directories, hence deleting data.
- Storage configurations
data_file_directories
: Using this configuration option, we can set the data storage location.
Default value: The default value is $CASSANDRA_HOME/data/data/var/lib/cassandra/data
in older versions.
commitlog_directory
: This is the location in HDD where Cassandra will store commitlog.
Default value: The default value is $CASSANDRA_HOME/data/commitlog /var/lib/cassandra/commitlog
in older versions.
Tip
If using non-SSDs, you should have a separate disk for storing commitlog
. Commit logs are append-only logs, however data files are random seeks in nature; so, using the same disk will affect the write performance of commit logs. Also, commit logs disks can be smaller in size. As the commitlog
space is reusable once flushed to Disk from Memtable.
saved_caches_directory
: This is the location where cached rows, partition keys, or counters will be saved to disk after a certain duration of time.
Default value: The default value is $CASSANDRA_HOME/data/saved_caches/var/lib/cassandra/saved_caches
Note
Row caching is disabled by default in cassandra.yaml
due to its limited use.
- Client configurations
rpc_address
: This is the thrift RPC service bind interface. You should set it appropriately; using the default won't allow connections from outside the node.
Default value: The default value is localhost
.
rpc_port
: This acts as a thrift service port.
Default value: The default value is 9160
native_transport_port
: This is the port on which the CQL native transport will listen for clients; for example, cqlsh
or Java Driver. This will use rpc_address
as the connection interface.
Default Value: The default value is 9042
.
- Security configurations
authenticator
: This configuration is used to specify whether you want to use a password-based authentication or none. For password-based authentication, authenticator
should be set to PasswordAuthenticator
. If PasswordAuthenticator
is used, a username and hashed password are saved in the system_auth.credentials
table.
Default value: The default value is AllowAllAuthenticator
, which means no authentication.
authorizer
: This configuration is used if you want to limit permissions to Cassandra objects, for example, tables. To enable authorization, set its value to CassandraAuthorizer
. If enabled, it stores authorization information in the system_auth.pemissions
table.
Default value: The default value is AllowAllAuthorizer
, which means authorization disabled.
Tip
If enabling authentication or authorization, increase system_auth
keyspace's replication factor.
cassandra-env.sh
This file can be used to fine-tune Cassandra. Here, you can set/tune a Java environement variable such as MAX_HEAP_SIZE
, HEAP_NEWSIZE
, and JAVA_OPTS
.
cassandra-in.sh
Here, you can alter the default values for environment variables such as JAVA_HOME
, CASSANDRA_HOME
and CLASSPATH
. Its location is in $CASSANDRA_HOME/bin/
in binary tarball installations. Package-based installations put this file inside the /user/share/cassandra
directory.
cassandra-rackdc.properties
The rack and data center configurations for a node are defined here. The default datacenter is DC1
and the default rack is RAC1
.
cassandra-topology.properties
This file contains mapping of Cassandra node IPs to data center and racks.
logback.xml
This file lets you configure the logging properties of Cassandra's system.log
. It is not available in older versions of Cassandra.