How Cassandra stores data
Database systems use a variety of structures to represent data on disk. Most traditional relational systems use a tabular approach, which enables random access queries supported by these systems. However, in order to achieve Cassandra's hallmark write performance, it must avoid these sorts of random access disk seeks because random disk I/O tends to be a significant bottleneck. Instead, the system employs a log-structured storage engine, which allows it to write data sequentially to both a commit log and to Cassandra's permanent structure, SSTables.
Implications of a log-structured storage
When a write is received, it is written simultaneously to the commit log and to a memtable. Note that the commit log is what provides durability of writes in Cassandra. Memtables are then periodically flushed to disk in the form of immutable SSTables.
This storage scheme has several important implications related to data modeling:
Writes are immutable. Since writes are always essentially...