How Cassandra stores data
Database systems use a variety of structures to represent data on disk. Most traditional relational systems use a tabular approach, which enables the kinds of random access queries supported by these systems. But in order to achieve Cassandra's hallmark write performance, it must avoid these sorts of random access disk seeks, because random disk I/O tends to be a significant bottleneck. Instead, the system employs a log-structured storage engine, which allows it to write data sequentially to both a commit log and Cassandra's permanent structure, SSTables.
Implications of log-structured storage
When a write is received, it is written simultaneously to the commit log and to an in- memory representation of the table, called a memtable. Note that the commit log is what provides durability of writes in Cassandra. Memtables are then periodically flushed to disk in the form of immutable SSTables.
Data in SSTables is split into partitions (which map to the primary key) and...