You're reading from Cassandra High Availability Harness the power of Apache Cassandra to build scalable, fault-tolerant, and readily available applications

Product type Paperback

Published in Dec 2014

Publisher Packt

ISBN-13 9781783989126

Length 186 pages

Edition 1st Edition

Tools

Cassandra

Concepts

Database Administration

Author (1):

Robbie Strickland

View More author details

Table of Contents (11) Chapters

Preface

1. Cassandra's Approach to High Availability FREE CHAPTER

2. Data Distribution

3. Replication

4. Data Centers

5. Scaling Out

6. High Availability Features in the Native Java Client

7. Modeling for High Availability

8. Antipatterns

9. Failing Gracefully

Index

The master-slave architecture

As distributed systems have become more commonplace, the need for higher capacity distributed databases has grown. Many distributed databases still attempt to maintain ACID guarantees (or in some cases only the consistency aspect, which is the most difficult in a distributed environment), leading to the master-slave architecture.

In this approach, there might be many servers handling requests, but only one server can actually perform writes so as to maintain data in a consistent state. This avoids the scenario where the same data can be modified via concurrent mutation requests to different nodes. The following diagram shows the most basic scenario:

However, we still have not solved the availability problem, as a failure of the write master would lead to application downtime. It also means that writes do not scale well, since they are all directed to a single machine.

Sharding

A variation on the master-slave approach that enables higher write volumes is a technique called sharding, in which the data is partitioned into groups of keys, such that one or more masters can own a known set of keys. For example, a database of user profiles can be partitioned by the last name, such that A-M belongs to one cluster and N-Z belongs to another, as follows:

An astute observer will notice that both master-slave and sharding introduce failure points on the master nodes, and in fact the sharding approach introduces multiple points of failure—one for each master! Additionally, the knowledge of where requests for certain keys go rests with the application layer, and adding shards requires manual shuffling of data to accommodate the modified key ranges.

Some systems employ shard managers as a layer of abstraction between the application and the physical shards. This has the effect of removing the requirement that the application must have knowledge of the partition map. However, it does not obviate the need for shuffling data as the cluster grows.

Master failover

A common means of increasing availability in the event of a failure on a master node is to employ a master failover protocol. The particular semantics of the protocol vary among implementations, but the general principle is that a new master is appointed when the previous one fails. Not all failover algorithms are equal; however, in general, this feature increases availability in a master-slave system.

Even a master-slave database that employs leader election suffers from a number of undesirable traits:

Applications must understand the database topology
Data partitions must be carefully planned
Writes are difficult to scale
A failover dramatically increases the complexity of the system in general, and especially so for multisite databases
Adding capacity requires reshuffling data with a potential for downtime

Considering that our objective is a highly available system, and presuming that scalability is a concern, are there other options we need to consider?

You're reading from Cassandra High Availability Harness the power of Apache Cassandra to build scalable, fault-tolerant, and readily available applications

Table of Contents (11) Chapters

The master-slave architecture

Sharding

Master failover

Authors (1)

Personalised recommendations for you