Using secondary indexes to avoid denormalization
So far, we've exclusively used primary key columns to look up rows—either the full primary key when we're looking for a specific row, or just the partition key when retrieving multiple rows in a single partition. We know that these kinds of lookups are very efficient because Cassandra can satisfy the query by accessing the single region of storage that holds the partition's data in order. This is the motivation for the denormalized follow structure we've built in this chapter: whether we want to answer the question Who does alice follow? or the question Who follows alice?, we can construct a query that only needs to access a single partition. However, we're accepting additional complexity in the form of storing two versions of the same information in user_inbound_follows
and user_outbound_follows
.
As it happens, Cassandra does provide us with a way to answer both questions in a reasonably efficient way using a single table with a single representation...