Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Learning Apache Cassandra
Learning Apache Cassandra

Learning Apache Cassandra: Build an efficient, scalable, fault-tolerant, and highly-available data layer into your application using Cassandra

eBook
R$80 R$218.99
Paperback
R$272.99
Subscription
Free Trial
Renews at R$50p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Learning Apache Cassandra

Chapter 1. Getting Up and Running with Cassandra

As an application developer, you have almost certainly worked with databases extensively. You must have built products using relational databases like MySQL and PostgreSQL, and perhaps experimented with a document store like MongoDB or a key-value database like Redis. While each of these tools has its strengths, you will now consider whether a distributed database like Cassandra might be the best choice for the task at hand.

In this chapter, we'll talk about the major reasons to choose Cassandra from among the many database options available to you. Having established that Cassandra is a great choice, we'll go through the nuts and bolts of getting a local Cassandra installation up and running. By the end of this chapter, you'll know:

  • When and why Cassandra is a good choice for your application
  • How to install Cassandra on your development machine
  • How to interact with Cassandra using cqlsh
  • How to create a keyspace

What Cassandra offers, and what it doesn't

Cassandra is a fully distributed, masterless database, offering superior scalability and fault tolerance to traditional single master databases. Compared with other popular distributed databases like Riak, HBase, and Voldemort, Cassandra offers a uniquely robust and expressive interface for modeling and querying data. What follows is an overview of several desirable database capabilities, with accompanying discussions of what Cassandra has to offer in each category.

Horizontal scalability

Horizontal scalability refers to the ability to expand the storage and processing capacity of a database by adding more servers to a database cluster. A traditional single-master database's storage capacity is limited by the capacity of the server that hosts the master instance. If the data set outgrows this capacity, and a more powerful server isn't available, the data set must be sharded among multiple independent database instances that know nothing of each other. Your application bears responsibility for knowing to which instance a given piece of data belongs.

Cassandra, on the other hand, is deployed as a cluster of instances that are all aware of each other. From the client application's standpoint, the cluster is a single entity; the application need not know, nor care, which machine a piece of data belongs to. Instead, data can be read or written to any instance in the cluster, referred to as a node; this node will forward the request to the instance where the data actually belongs.

The result is that Cassandra deployments have an almost limitless capacity to store and process data; when additional capacity is required, more machines can simply be added to the cluster. When new machines join the cluster, Cassandra takes care of rebalancing the existing data so that each node in the expanded cluster has a roughly equal share.

Note

Cassandra is one of the several popular distributed databases inspired by the Dynamo architecture, originally published in a paper by Amazon. Other widely used implementations of Dynamo include Riak and Voldemort. You can read the original paper at http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf.

High availability

The simplest database deployments are run as a single instance on a single server. This sort of configuration is highly vulnerable to interruption: if the server is affected by a hardware failure or network connection outage, the application's ability to read and write data is completely lost until the server is restored. If the failure is catastrophic, the data on that server might be lost completely.

A master-follower architecture improves this picture a bit. The master instance receives all write operations, and then these operations are replicated to follower instances. The application can read data from the master or any of the follower instances, so a single host becoming unavailable will not prevent the application from continuing to read data. A failure of the master, however, will still prevent the application from performing any write operations, so while this configuration provides high read availability, it doesn't completely provide high availability.

Cassandra, on the other hand, has no single point of failure for reading or writing data. Each piece of data is replicated to multiple nodes, but none of these nodes holds the authoritative master copy. If a machine becomes unavailable, Cassandra will continue writing data to the other nodes that share data with that machine, and will queue the operations and update the failed node when it rejoins the cluster. This means in a typical configuration, two nodes must fail simultaneously for there to be any application-visible interruption in Cassandra's availability.

Tip

How many copies?

When you create a keyspace—Cassandra's version of a database—you specify how many copies of each piece of data should be stored; this is called the replication factor. A replication factor of 3 is a common and good choice for many use cases.

Write optimization

Traditional relational and document databases are optimized for read performance. Writing data to a relational database will typically involve making in-place updates to complicated data structures on disk, in order to maintain a data structure that can be read efficiently and flexibly. Updating these data structures is a very expensive operation from a standpoint of disk I/O, which is often the limiting factor for database performance. Since writes are more expensive than reads, you'll typically avoid any unnecessary updates to a relational database, even at the expense of extra read operations.

Cassandra, on the other hand, is highly optimized for write throughput, and in fact never modifies data on disk; it only appends to existing files or creates new ones. This is much easier on disk I/O and means that Cassandra can provide astonishingly high write throughput. Since both writing data to Cassandra, and storing data in Cassandra, are inexpensive, denormalization carries little cost and is a good way to ensure that data can be efficiently read in various access scenarios.

Note

Because Cassandra is optimized for write volume, you shouldn't shy away from writing data to the database. In fact, it's most efficient to write without reading whenever possible, even if doing so might result in redundant updates.

Just because Cassandra is optimized for writes doesn't make it bad at reads; in fact, a well-designed Cassandra database can handle very heavy read loads with no problem. We'll cover the topic of efficient data modeling in great depth in the next few chapters.

Structured records

The first three database features we looked at are commonly found in distributed data stores. However, databases like Riak and Voldemort are purely key-value stores; these databases have no knowledge of the internal structure of a record that's stored at a particular key. This means useful functions like updating only part of a record, reading only certain fields from a record, or retrieving records that contain a particular value in a given field are not possible.

Relational databases like PostgreSQL, document stores like MongoDB, and, to a limited extent, newer key-value stores like Redis do have a concept of the internal structure of their records, and most application developers are accustomed to taking advantage of the possibilities this allows. None of these databases, however, offer the advantages of a masterless distributed architecture.

In Cassandra, records are structured much in the same way as they are in a relational database—using tables, rows, and columns. Thus, applications using Cassandra can enjoy all the benefits of masterless distributed storage while also getting all the advanced data modeling and access features associated with structured records.

Secondary indexes

A secondary index, commonly referred to as an index in the context of a relational database, is a structure allowing efficient lookup of records by some attribute other than their primary key. This is a widely useful capability: for instance, when developing a blog application, you would want to be able to easily retrieve all of the posts written by a particular author. Cassandra supports secondary indexes; while Cassandra's version is not as versatile as indexes in a typical relational database, it's a powerful feature in the right circumstances.

Efficient result ordering

It's quite common to want to retrieve a record set ordered by a particular field; for instance, a photo sharing service will want to retrieve the most recent photographs in descending order of creation. Since sorting data on the fly is a fundamentally expensive operation, databases must keep information about record ordering persisted on disk in order to efficiently return results in order. In a relational database, this is one of the jobs of a secondary index.

In Cassandra, secondary indexes can't be used for result ordering, but tables can be structured such that rows are always kept sorted by a given column or columns, called clustering columns. Sorting by arbitrary columns at read time is not possible, but the capacity to efficiently order records in any way, and to retrieve ranges of records based on this ordering, is an unusually powerful capability for a distributed database.

Immediate consistency

When we write a piece of data to a database, it is our hope that that data is immediately available to any other process that may wish to read it. From another point of view, when we read some data from a database, we would like to be guaranteed that the data we retrieve is the most recently updated version. This guarantee is called immediate consistency, and it's a property of most common single-master databases like MySQL and PostgreSQL.

Distributed systems like Cassandra typically do not provide an immediate consistency guarantee. Instead, developers must be willing to accept eventual consistency, which means when data is updated, the system will reflect that update at some point in the future. Developers are willing to give up immediate consistency precisely because it is a direct tradeoff with high availability.

In the case of Cassandra, that tradeoff is made explicit through tunable consistency. Each time you design a write or read path for data, you have the option of immediate consistency with less resilient availability, or eventual consistency with extremely resilient availability. We'll cover consistency tuning in great detail in Chapter 10, How Cassandra Distributes Data.

Discretely writable collections

While it's useful for records to be internally structured into discrete fields, a given property of a record isn't always a single value like a string or an integer. One simple way to handle fields that contain collections of values is to serialize them using a format like JSON, and then save the serialized collection into a text field. However, in order to update collections stored in this way, the serialized data must be read from the database, decoded, modified, and then written back to the database in its entirety. If two clients try to perform this kind of modification to the same record concurrently, one of the updates will be overwritten by the other.

For this reason, many databases offer built-in collection structures that can be discretely updated: values can be added to, and removed from collections, without reading and rewriting the entire collection. Cassandra is no exception, offering list, set, and map collections, and supporting operations like "append the number 3 to the end of this list". Neither the client nor Cassandra itself needs to read the current state of the collection in order to update it, meaning collection updates are also blazingly efficient.

Relational joins

In real-world applications, different pieces of data relate to each other in a variety of ways. Relational databases allow us to perform queries that make these relationships explicit, for instance, to retrieve a set of events whose location is in the state of New York (this is assuming events and locations are different record types). Cassandra, however, is not a relational database, and does not support anything like joins. Instead, applications using Cassandra typically denormalize data and make clever use of clustering in order to perform the sorts of data access that would use a join in a relational database.

For data sets that aren't already denormalized, applications can also perform client-side joins, which mimic the behavior of a relational database by performing multiple queries and joining the results at the application level. Client-side joins are less efficient than reading data that has been denormalized in advance, but offer more flexibility. We'll cover both of these approaches in Chapter 6, Denormalizing Data for Maximum Performance.

MapReduce

MapReduce is a technique for performing aggregate processing on large amounts of data in parallel; it's a particularly common technique in data analytics applications. Cassandra does not offer built-in MapReduce capabilities, but it can be integrated with Hadoop in order to perform MapReduce operations across Cassandra data sets, or Spark for real-time data analysis. The DataStax Enterprise product provides integration with both of these tools out-of-the-box.

Comparing Cassandra to the alternatives

Now that you've got an in-depth understanding of the feature set that Cassandra offers, it's time to figure out which features are most important to you, and which database is the best fit. The following table lists a handful of commonly used databases, and key features that they do or don't have:

Feature

Cassandra

PostgreSQL

MongoDB

Redis

Riak

Structured records

Yes

Yes

Yes

Limited

No

Secondary indexes

Yes

Yes

Yes

No

Yes

Discretely writable collections

Yes

Yes

Yes

Yes

No

Relational joins

No

Yes

No

No

No

Built-in MapReduce

No

No

Yes

No

Yes

Fast result ordering

Yes

Yes

Yes

Yes

No

Immediate consistency

Configurable at query level

Yes

Yes

Yes

Configurable at cluster level

Transparent sharding

Yes

No

Yes

No

Yes

No single point of failure

Yes

No

No

No

Yes

High throughput writes

Yes

No

No

Yes

Yes

As you can see, Cassandra offers a unique combination of scalability, availability, and a rich set of features for modeling and accessing data.

Installing Cassandra

Now that you're acquainted with Cassandra's substantial powers, you're no doubt chomping at the bit to try it out. Happily, Cassandra is free, open source, and very easy to get running.

Installing Cassandra

Installing on Mac OS X

First, we need to make sure that we have an up-to-date installation of the Java Runtime Environment. Open the Terminal application, and type the following into the command prompt:

$ java -version

You will see an output that looks something like the following:

java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

Note

Pay particular attention to the java version listed: if it's lower than 1.7.0_25, you'll need to install a new version. If you have an older version of Java or if Java isn't installed at all, head to https://www.java.com/en/download/mac_download.jsp and follow the download instructions on the page.

You'll need to set up your environment so that Cassandra knows where to find the latest version of Java. To do this, set up your JAVA_HOME environment variable to the install location, and your PATH to include the executable in your new Java installation as follows:

$ export JAVA_HOME="/Library/Internet Plug- Ins/JavaAppletPlugin.plugin/Contents/Home"
$ export PATH="$JAVA_HOME/bin":$PATH

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You should put these two lines at the bottom of your .bashrc file to ensure that things still work when you open a new terminal.

Note

The installation instructions given earlier assume that you're using the latest version of Mac OS X (at the time of writing this, 10.10 Yosemite). If you're running a different version of OS X, installing Java might require different steps. Check out https://www.java.com/en/download/faq/java_mac.xml for detailed installation information.

Once you've got the right version of Java, you're ready to install Cassandra. It's very easy to install Cassandra using Homebrew; simply type the following:

$ brew install cassandra
$ pip install cassandra-driver cql
$ cassandra

Here's what we just did:

  • Installed Cassandra using the Homebrew package manager
  • Installed the CQL shell and its dependency, the Python Cassandra driver
  • Started the Cassandra server

Installing on Ubuntu

First, we need to make sure that we have an up-to-date installation of the Java Runtime Environment. Open the Terminal application, and type the following into the command prompt:

$ java -version

You will see an output that looks similar to the following:

java version "1.7.0_65"
OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3- 0ubuntu0.14.04.1)
OpenJDK 64-bit Server VM (build 24.65-b04, mixed mode)

Note

Pay particular attention to the java version listed: it should start with 1.7. If you have an older version of Java, or if Java isn't installed at all, you can install the correct version using the following command:

$ sudo apt-get install openjdk-7-jre-headless

Once you've got the right version of Java, you're ready to install Cassandra. First, you need to add Apache's Debian repositories to your sources list. Add the following lines to the file /etc/apt/sources.list:

deb http://www.apache.org/dist/cassandra/debian 21x main
deb-src http://www.apache.org/dist/cassandra/debian 21x main

In the Terminal application, type the following into the command prompt:

$ gpg --keyserver pgp.mit.edu --recv-keys F758CE318D77295D
$ gpg --export --armor F758CE318D77295D | sudo apt-key add -
$ gpg --keyserver pgp.mit.edu --recv-keys 2B5C1B00
$ gpg --export --armor 2B5C1B00 | sudo apt-key add -
$ gpg --keyserver pgp.mit.edu --recv-keys 0353B12C
$ gpg --export --armor 0353B12C | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install cassandra
$ cassandra

Here's what we just did:

  • Added the Apache repositories for Cassandra 2.1 to our sources list
  • Added the public keys for the Apache repo to our system and updated our repository cache
  • Installed Cassandra
  • Started the Cassandra server

Installing on Windows

The easiest way to install Cassandra on Windows is to use the DataStax Community Edition. DataStax is a company that provides enterprise-level support for Cassandra; they also release Cassandra packages at both free and paid tiers. DataStax Community Edition is free, and does not differ from the Apache package in any meaningful way.

DataStax offers a graphical installer for Cassandra on Windows, which is available for download at planetcassandra.org/cassandra. On this page, locate Windows Server 2008/Windows 7 or Later (32-Bit) from the Operating System menu (you might also want to look for 64-bit if you run a 64-bit version of Windows), and choose MSI Installer (2.x) from the version columns.

Download and run the MSI file, and follow the instructions, accepting the defaults:

Installing on Windows

Once the installer completes this task, you should have an installation of Cassandra running on your machine.

Bootstrapping the project

Throughout the remainder of this book, we will build an application called MyStatus, which allows users to post status updates for their friends to read. In each chapter, we'll add new functionality to the MyStatus application; each new feature will also introduce a new aspect of Cassandra.

CQL – the Cassandra Query Language

Since this is a book about Cassandra and not targeted to users of any particular programming language or application framework, we will focus entirely on the database interactions that MyStatus will require. Code examples will be in Cassandra Query Language (CQL). Specifically, we'll use version 3.1.1 of CQL, which is available in Cassandra 2.0.6 and later versions.

As the name implies, CQL is heavily inspired by SQL; in fact, many CQL statements are equally valid SQL statements. However, CQL and SQL are not interchangeable. CQL lacks a grammar for relational features such as JOIN statements, which are not possible in Cassandra. Conversely, CQL is not a subset of SQL; constructs for retrieving the update time of a given column, or performing an update in a lightweight transaction, which are available in CQL, do not have an SQL equivalent.

Note

Throughout this book, you'll learn the important constructs of CQL. Once you've completed reading this book, I recommend you to turn to the DataStax CQL documentation for additional reference. This documentation is available at http://www.datastax.com/documentation/cql/3.1.

Interacting with Cassandra

Most common programming languages have drivers for interacting with Cassandra. When selecting a driver, you should look for libraries that support the CQL binary protocol, which is the latest and most efficient way to communicate with Cassandra.

Tip

The CQL binary protocol is a relatively new introduction; older versions of Cassandra used the Thrift protocol as a transport layer. Although Cassandra continues to support Thrift, avoid Thrift-based drivers, as they are less performant than the binary protocol.

Here are CQL binary drivers available for some popular programming languages:

Language

Driver

Available at

Java

DataStax Java Driver

github.com/datastax/java-driver

Python

DataStax Python Driver

github.com/datastax/python-driver

Ruby

DataStax Ruby Driver

github.com/datastax/ruby-driver

C++

DataStax C++ Driver

github.com/datastax/cpp-driver

C#

DataStax C# Driver

github.com/datastax/csharp-driver

JavaScript (Node.js)

node-cassandra-cql

github.com/jorgebay/node-cassandra-cql

PHP

phpbinarycql

github.com/rmcfrazier/phpbinarycql

While you will likely use one of these drivers in your applications, to try out the code examples in this book, you can simply use the cqlsh tool, which is a command-line interface for executing CQL queries and viewing the results. To start cqlsh on OS X or Linux, simply type cqlsh into your command line; you should see something like this:

$ cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh>

On Windows, you can start cqlsh by finding the Cassandra CQL Shell application in the DataStax Community Edition group in your applications. Once you open it, you should see the same output we just saw.

Creating a keyspace

A keyspace is a collection of related tables, equivalent to a database in a relational system. To create the keyspace for our MyStatus application, issue the following statement in the CQL shell:

CREATE KEYSPACE "my_status"
WITH REPLICATION = {
  'class': 'SimpleStrategy', 'replication_factor': 1
};

Here we created a keyspace called my_status, which we will use for the remainder of this book. When we create a keyspace, we have to specify replication options. Cassandra provides several strategies for managing replication of data; SimpleStrategy is the best strategy as long as your Cassandra deployment does not span multiple data centers. The replication_factor value tells Cassandra how many copies of each piece of data are to be kept in the cluster; since we are only running a single instance of Cassandra, there is no point in keeping more than one copy of the data. In a production deployment, you would certainly want a higher replication factor; 3 is a good place to start.

Note

A few things at this point are worth noting about CQL's syntax:

  • It's syntactically very similar to SQL; as we further explore CQL, the impression of similarity will not diminish.
  • Double quotes are used for identifiers such as keyspace, table, and column names. As in SQL, quoting identifier names is usually optional, unless the identifier is a keyword or contains a space or another character that will trip up the parser.
  • Single quotes are used for string literals; the key-value structure we use for replication is a map literal, which is syntactically similar to an object literal in JSON.
  • As in SQL, CQL statements in the CQL shell must terminate with a semicolon.

Selecting a keyspace

Once you've created a keyspace, you would want to use it. In order to do this, employ the USE command:

USE "my_status";

This tells Cassandra that all future commands will implicitly refer to tables inside the my_status keyspace. If you close the CQL shell and reopen it, you'll need to reissue this command.

Summary

In this chapter, you explored the reasons to choose Cassandra from among the many databases available, and having determined that Cassandra is a great choice, you installed it on your development machine.

You had your first taste of the Cassandra Query Language when you issued your first command via the CQL shell in order to create a keyspace. You're now poised to begin working with Cassandra in earnest.

In the next chapter, we'll begin building the MyStatus application, starting out with a simple table to model users. We'll cover a lot more CQL commands, and before you know it, you'll be reading and writing data like a pro.

Left arrow icon Right arrow icon

Description

If you're an application developer familiar with SQL databases such as MySQL or Postgres, and you want to explore distributed databases such as Cassandra, this is the perfect guide for you. Even if you've never worked with a distributed database before, Cassandra's intuitive programming interface coupled with the step-by-step examples in this book will have you building highly scalable persistence layers for your applications in no time.

What you will learn

  • Install Cassandra and create your first keyspace
  • Choose the right table structure for the task at hand in a variety of scenarios
  • Use range slice queries for efficient data access
  • Effortlessly handle concurrent updates with collection columns
  • Ensure data integrity with lightweight transactions and logged batches
  • Understand eventual consistency and use the right consistency level for your situation
  • Implement best practices for data modeling and access

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Feb 25, 2015
Length: 246 pages
Edition : 1st
Language : English
ISBN-13 : 9781783989201
Vendor :
Apache
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Feb 25, 2015
Length: 246 pages
Edition : 1st
Language : English
ISBN-13 : 9781783989201
Vendor :
Apache
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
R$50 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
R$500 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just R$25 each
Feature tick icon Exclusive print discounts
R$800 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just R$25 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total R$ 852.97
Mastering Apache Cassandra - Second Edition
R$306.99
Learning Real-time Analytics with Storm and Cassandra
R$272.99
Learning Apache Cassandra
R$272.99
Total R$ 852.97 Stars icon

Table of Contents

13 Chapters
1. Getting Up and Running with Cassandra Chevron down icon Chevron up icon
2. The First Table Chevron down icon Chevron up icon
3. Organizing Related Data Chevron down icon Chevron up icon
4. Beyond Key-Value Lookup Chevron down icon Chevron up icon
5. Establishing Relationships Chevron down icon Chevron up icon
6. Denormalizing Data for Maximum Performance Chevron down icon Chevron up icon
7. Expanding Your Data Model Chevron down icon Chevron up icon
8. Collections, Tuples, and User-defined Types Chevron down icon Chevron up icon
9. Aggregating Time-Series Data Chevron down icon Chevron up icon
10. How Cassandra Distributes Data Chevron down icon Chevron up icon
A. Peeking Under the Hood Chevron down icon Chevron up icon
B. Authentication and Authorization Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.4
(13 Ratings)
5 star 38.5%
4 star 61.5%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Amazon Customer Oct 07, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book covers Cassandra features and CQL very well. Book also touches on internal workings of Cassandra.Book doesn't touch any programming language API available for for communicating with Cassandra. It would have been good if one additional chapter has been dedicated to programming language API like Java.
Amazon Verified review Amazon
Jascha Casadio Dec 20, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Sooner or later, every professional working with noSQL databases will meet a beauty called Cassandra, a distributed database management system initially developed by Facebook and now a top-level project at the Apache Foundation. Widely acclaimed for its features, Cassandra is a beast difficult to tame. Next to tech giants such as Facebook, Twitter and Ubisoft are a myriad of start-ups selling data driven products relying on Cassandra. The high demand of professionals able to properly set it up and tune it results in a lot of documentation available, plus a very active community ready to improve it and support new comers. Among the many titles available to us at the bookstore is Learning Apache Cassandra by Mat Brown, released early this 2015, a title that covers the latest features and targets beginners, be them professionals or simply curious people interested in a quick taste of what Cassandra is and what is it capable of.Spanning through less than 300 pages, Learning Cassandra is short but very intense book. By the time the reader gets to the back cover, he will feel like he has been exposed to a lot of information and, mainly, that he was gradually been taken, step by step, through the very basics of this technology, without any abrut change of topic, without feeling lost. The book written by Mat is easy to follow and comprehensive. Through a learn by doing approach, he does introduce each and every concepts, starting with an overview of what Cassandra is and how it differs from other solutions, up to topics such as lightweight transactions and collections, passing through the different data types. The different topics are presented to the reader through a unique example, an application that closely resembles some basic feature of Facebook. The examples are explained step by step. Nothing is taken for granted.A couple of things deserve a praise: the first, is the choice of the author to present Cassandra through CQL, rather than through the bindings of a specific programming language, which would make it less useful and interesting to most of us; the second is the many concepts presented first in SQL, then in Cassandra. Mat here beautifully shows the differences between a noSQL database and an old school relational database. The author often highlights the importance of denormalizing the data and to state the questions we want to answer first and then build our queries around them.Being a book for beginners, we must say that this title does not present any real world example. Similarly, Mat does use a test cluster made of a single machine, which is far from what we find in production. This rules out all the worries about replication, consensus and so on.Overall, this is by far the most friendly introduction to Cassandra that I have read so far. While it does not provide anything related to administration and configuration, as well as nothing about real world scenarios, it is a perfect companion to anyone interested in learning what Cassandra is and how to get it to work.As usual, you can find more reviews on my personal blog: http://books.lostinmalloc.com. Feel free to pass by and share your thoughts!
Amazon Verified review Amazon
AJ Jun 22, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Thanks to Mat, this book is awesome. I read almost all the books on Cassandra available, and somehow I missed this one. This happens to be my latest reading.One question I have for Mat on Pp 170 of the book. While the diagram seems to be correct, the explanation between the two diagrams on this page, for me, looks incorrect. I could be mistaken.In my opinion the explanation should say Alice is on Node3, Bob is on Node1 and Ivan is on Node2Am I missing something. If any one could clarify I would appreciate it.Five stars for this book and the author.
Amazon Verified review Amazon
Amazon Customer Jan 21, 2016
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I am completely new to Apache Cassandra. I just decided to learn it out of my own interest. I purchased the Kindle edition of this book 3 days back. Till now I have completed 5th chapter and I must say that after a long time I am reading such a good book.Contents are perfectly organized.Lots of real life examples and solutions.No unnecessary description or irrelevant data.Language and presentation style is so simple that beginners will love to read it.I am completely satisfied with this book.
Amazon Verified review Amazon
Nagender Dec 25, 2016
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Good book.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.