Distributed joins
With relational databases, we write different data entities in their own tables, and then we join them to form the desired view at query time. If we apply this idea to a database like Cassandra, we end up with a distributed join.
New Cassandra developers, especially those who come from a relational database background, are particularly prone to following this pattern. In the last chapter, we mentioned that denormalization is the key to successful data modeling in Cassandra, and our discussion of secondary indices can help explain the reasons for this.
Tip
If you find yourself querying multiple large tables and then joining them in your application based on some shared key, you are performing a distributed join. This should almost always be avoided in favor of a denormalized data model. The only exception is for very small lookup tables that can fit easily in memory. Otherwise, you should always write your data the way you intend to read it.
At this point, you should be familiar...