Handling conflicting data
As we explored above, Cassandra's masterless replication can lead to situations in which multiple versions of the same record exist on different nodes. Since there is no master node containing the canonical copy of a record, Cassandra must use other means to determine which version of the data is correct.
This situation comes into play when reading data at any consistency level other than ONE
. When our application requests a row from Cassandra, we will receive a response with that row's data; each column will contain one value. However, if we're reading at a consistency level such as QUORUM
or ALL
, Cassandra internally will fetch the copies of the data from multiple nodes; it's possible that the different copies will contain conflicting data. It's up to Cassandra to figure out exactly what to return to us.
The problem is most acute when different clients are writing the same piece of data concurrently. Let's return to a scenario we explored...