Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Cassandra Design Patterns

You're reading from   Cassandra Design Patterns Build real-world, industry-strength data storage solutions with time-tested design methodologies using Cassandra

Arrow left icon
Product type Paperback
Published in Nov 2015
Publisher
ISBN-13 9781785285707
Length 168 pages
Edition 2nd Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Rajanarayanan Thottuvaikkatumana Rajanarayanan Thottuvaikkatumana
Author Profile Icon Rajanarayanan Thottuvaikkatumana
Rajanarayanan Thottuvaikkatumana
Arrow right icon
View More author details
Toc

A brief overview of Cassandra

Where do you start with Cassandra? The best place is to look at the new application development requirements and take it from there. Look at cases where there is a need to denormalize the RDBMS tables and keep all the data items together, which would have been distributed if you were to design the same solution in an RDBMS. If an application is writing a set of data items together into a data store, why do you want to separate them out? No need to worry about redundancy. This is the new NoSQL philosophy. This is the new way to look at data modeling in NoSQL data stores. Cassandra supports fast writes and reads. Initial versions of Cassandra had some performance problems, but a huge number of optimizations have gone into making the latest version of Cassandra perform much better for reads as well as writes. There is no problem with consuming space because the secondary storage is getting cheaper and cheaper. A word of caution here is that, it is fine to write the data into Cassandra, whatever the level of redundancy, but the data access use cases have to be thought through carefully before getting involved in the Cassandra data model. The data is stored in the disk, to be read at a later date. These reads have to be efficient, and it gives the required data in the desired sorted order.

In a nutshell, you should decide how do you want to store the data and make sure that it is giving you the data in the desired sort order. There is no hard and fast rule for this. It is purely up to the application requirements. That is, the other shift in the thought process.

Instead of thinking from the pure data model perspective, start thinking in terms of the application's perspective. How the data is generated by the application, what are the read requirements, what are the write requirements, what is the response time expected out of some of the use cases, and so on. Depending on these aspects, design the data model. In the big data world, the application becomes the first class citizen and the data model leaves the driving seat in the application design. Design the data model to serve the needs of the applications.

In any organization, new reporting requirements come all the time. The major challenge to generate reports is the underlying data store. In the RDBMS world, reporting is always a challenge. You may have to join multiple tables to generate even simple reports. Even though the RDBMS objects such as views, stored procedures, and indexes may be used to get the desired data for the reports, when the report is being generated, the query plan is going to be very complex most of the time. Consumption of the processing power is another need to consider when generating such reports on the fly. Because of these complexities, many times, for reporting requirements, it is common to keep separate tables containing data exported from the transactional tables. Martin Fowler emphasizes the need for separating reporting data from the operations data in his article, Reporting Database. He states:

"Most Enterprise Applications store persistent data with a database. This database supports operational updates of the application's state, and also various reports used for decision support and analysis. The operational needs and the reporting needs are, however, often quite different - with different requirements from a schema and different data access patterns. When this happens it's often a wise idea to separate the reporting needs into a reporting database, which takes a copy of the essential operational data but represents it in a different schema".

This is a great opportunity to start with NoSQL stores such as Cassandra as a reporting data store.

Data aggregation and summarization are the common requirements in any organization. This helps to control data growth in by storing only the summary statistics and moving the transactional data into archives. Often, these aggregated and summarized data are used for statistical analysis. In many websites, you can see the summary of your data instantaneously when you log in to the site or when you perform transactions. Some of the examples include the available credit limit of credit cards, the available number of text messages, remaining international call minutes in a mobile phone account, and so on. Making the summary accurate and easily accessible is a big challenge. Most of the time, data aggregation and reporting go hand in hand. The aggregated data is used heavily in reports. The aggregation process speeds up the queries to a great extent. In RDBMS, it is always a challenge to aggregate data, and you can find new requirements coming all the time. This is another place you can start with NoSQL stores such as Cassandra.

Now, we are going to discuss some aspects of the denormalization, reporting, and aggregation of data using Cassandra as the preferred NoSQL data store.

You have been reading a chapter from
Cassandra Design Patterns - Second Edition
Published in: Nov 2015
Publisher:
ISBN-13: 9781785285707
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image