The crop of distributed databases that have come to the market in recent years appeals to application developers for several reasons. Their storage capacity is nearly limitless, bounded only by the number of machines you can afford to spin up. Masterless replication makes them resilient to adverse events, handling even a complete machine failure without any noticeable effect on the applications that rely on them. Log-structured storage engines allow these databases to handle high volume write loads without blinking an eye.
But compared to traditional relational databases, not to mention newer document stores, distributed databases are typically feature-poor and inconvenient to work with. Read and write functionality is frequently confined to simple key-value operations, with more complex operations demanding arcane map-reduce implementations. Happily, Cassandra provides all of the benefits of a fully distributed data store while also exposing a familiar, user-friendly data model and query interface.
By the time I began writing this book, Cassandra had seen plenty of improvements with regards to performance and feature set since its inception. The earliest versions of Cassandra were optimized for fast and large volumes of writes. The read performance was good, but not at par with the write performance. Several improvements were made to make reads considerably faster, such as the addition of bloom filters, caching mechanisms, better indexing, and partitioning.
Over the past couple of years, we have had several successful deployments of Cassandra, both on premise and in the cloud. I have helped several teams migrate from traditional databases to Cassandra without a hitch. Since it is a fully distributed database with masterless architecture, it works well with a scheduling framework such as Mesos. The toughest challenge one would face when transitioning from a relational database to Cassandra would be to come up with an optimal data model. While Cassandra allows you to have flexible models, it is still vital to ensure you get the maximum performance out of it.
The goal of this book is to teach: how to use Cassandra effectively, powerfully, and efficiently. We'll explore Cassandra's ins and outs by designing the persistence layer for a messaging service that allows users to post status updates that are visible to their friends. By the end of the book, you'll be fully prepared to build your own highly scalable and highly available applications.