Secondary indices
If range queries can be considered optimal for Cassandra's storage engine, queries based on a secondary index fall at the other end of the spectrum. Secondary indices have been a part of Cassandra since the 0.7 release, and they are certainly an alluring feature. In fact, for those who are accustomed to modeling data in relational databases, creating an index is often a go-to strategy to achieve better query performance. However, as with most aspects of the transition to Cassandra, this strategy translates poorly.
To start, let's get familiar with what secondary indices are and how they work. The purpose of an index is to allow query-by-value functionality, which is not supported naturally. This should be a clue as to the potential danger involved in relying on the index functionality.
As an example, suppose we want to be able to query authors for a given publisher. Using out earlier authors
table, remember that the publisher
column has no special properties. It is a simple...