Preface
The crop of distributed databases that has come to the market in recent years appeals to application developers for several reasons. Their storage capacity is nearly limitless, bounded only by the number of machines you can afford to spin up. Masterless replication makes them resilient to adverse events, handling even a complete machine failure without any noticeable effect on the applications that rely on them. Log-structured storage engines allow these databases to handle high-volume write loads without blinking an eye.
But compared to traditional relational databases, not to mention newer document stores, distributed databases are typically feature-poor and inconvenient to work with. Read and write functionality is frequently confined to simple key-value operations, with more complex operations demanding arcane map-reduce implementations. Happily, Cassandra provides all of the benefits of a fully-distributed data store while also exposing a familiar, user-friendly data model and query interface.
I first encountered Cassandra working on an application that stored our users' extended social graphs across a variety of services. With a hundred or so alpha users in the system, it became clear that, even at relatively modest traction, our storage needs would go beyond what our PostgreSQL database could comfortably handle. After surveying the landscape of horizontally scalable data stores, we decided to migrate to Cassandra because its table-based data structure seemed to provide an easy migration path from the application we had already built.
Our first deployment of Cassandra ported our previous PostgreSQL schema table-for-table. Only after taking the application to production did we come to realize that our expertise designing schemas for a relational world didn't map directly to a distributed store such as Cassandra. While we were happy to be storing tons of data at very high write volumes, it was clear we weren't getting maximum performance out of the database.
The story has a happy ending: after a rough initial launch, we went back to the drawing board and reworked our data model from the ground up to take advantage of Cassandra's strengths. With that new version deployed, Cassandra effortlessly handled our application scaling to hundreds of thousands of users' social graphs.
The goal of this book is to teach you the easy way what we learned the hard way: how to use Cassandra effectively, powerfully, and efficiently. We'll explore Cassandra's ins and outs by designing the persistence layer for a messaging service that allows users to post status updates that are visible to their friends. By the end of the book, you'll be fully prepared to build your own highly scalable, highly available applications.