Amazon DynamoDB - The Definitive Guide

Amazon DynamoDB in Action

Amazon DynamoDB is a fully-managed NoSQL database service from Amazon Web Services (AWS). It offers single-digit millisecond performance and virtually unlimited data storage, and it can easily be tuned to support any throughput required for your workload, at any scale. Since its launch in January 2012, DynamoDB has come a remarkably long way and consistently evolves to support additional features, enabling more businesses and enterprises to leverage the scale of this NoSQL offering.

Welcome to the first chapter of Amazon DynamoDB – The Definitive Guide, where Mike and I, Aman, dive into all things NoSQL, focusing on DynamoDB. We’ll explore its backstory, how it can supercharge your applications, and how it frees you from the hassles of managing clusters and instances. Learning NoSQL and DynamoDB isn’t too tough, but our aim with this book is to share everything we’ve learned to make your journey smoother. That’s why we’ve titled it The Definitive Guide – it’s your go-to resource for mastering DynamoDB.

The book is for those of you who are in roles such as software architects, developers, engineering managers, data engineers, product owners, or traditional database administrators. The book itself will provide you with all the skills and knowledge needed to make the most of DynamoDB for your application. This means benefiting from predictable performance at scale, while also being highly available, fault-tolerant, and requiring minimal ongoing management. I could continue listing the service’s benefits and why you should choose it for your applications, but it might be more valuable if you read through examples and success stories of other customers later in this chapter. Before we get started, let’s learn about Mike’s relationship with DynamoDB, in his own words.

My (Mike) journey into the world of DynamoDB has been a tale of two paths. The first path was to overall enlightenment, which was not as simple as I thought it might be – and that is what in part has led to this book. I wanted to share my knowledge, the way I approached data modeling, and how I had to unlearn some core concepts from the relational database world that then gave me that lightbulb moment when working with DynamoDB.

The second path was knowing where DynamoDB fits in my toolset, how to take advantage of the astonishing number of features that come baked into the service, and lastly, knowing when not to try and make workloads fit when there are more practical technologies out there. While DynamoDB can work for the majority of your database workloads, we need to be realistic and understand what it is not suited for, and that is where being armed with the right knowledge lets us make better, informed decisions about our database choices.

In this chapter, we will dive into DynamoDB’s position in today’s database market and trace its evolution in time. Subsequently, we will explore real-world use cases showcasing the impressive scale and volume regularly achieved by DynamoDB, demonstrating its battle-hardened capabilities beyond routine customer traffic.

We will then evaluate which workloads DynamoDB is suited for and how it can easily become one of the most used tools in your set of database technologies once you qualify DynamoDB as the right fit for the use case. There are many additional database technologies that do their own job extremely well, so knowing when to offload duties to the right technology is key in our decision making.

By the end of this chapter, you will have enough background knowledge and plenty of excitement to start setting up a working environment and looking at some useful tools we can leverage in preparation for getting hands-on with DynamoDB.

In this chapter, we will cover the following topics:

NoSQL and DynamoDB in the current database market
DynamoDB case studies
Workloads not suited for DynamoDB

Reviewing the role of NoSQL and DynamoDB in the current database market

Let’s take a look at how DynamoDB came to be – where it sits in the current NoSQL market, and how NoSQL compares to relational databases. Here, we cover a couple of core differences, both functional and technical, and learn of the very beginnings of the DynamoDB service.

My IT career started more than 20 years ago, in the early 2000s, when relational databases were the default choice of any application that required a data persistence layer. I had built up a fairly solid knowledge of leveraging MySQL in my builds and applications, as was the de facto for many similar developers and open source enthusiasts at the time. Like many before (and after) me, I had been at the mercy of maxing out CPU and connection limits of single-instance database servers, having headaches over performance, trying to manage uptime and reliability, and, of course, ensuring everything stays as secure as possible.

I remember sitting in the keynote session at the AWS London Summit at ExCeL, an exhibition center. An AWS customer was on stage talking about their use of DynamoDB. They were in the ad-space sector and bidding on real-time advertising impressions. Speed was crucial to their success (as is the case for many other industry verticals), and their database technology of choice was DynamoDB. I recall a slide where they discussed that their peak request per second (RPS) rate was frequently hitting 1 million RPS and that DynamoDB was easily handling this (and more when having to burst above that rate). I sat there thinking, “If there’s a database technology that lets me do 1M RPS without breaking a sweat or spending hours (if not days) having to configure it for that level, I want to be a part of it.” It was from there that I became engrossed in DynamoDB, and quite frankly, I have not looked back since.

Comparing RDBMS and NoSQL – a high-level overview

Over the past few years, NoSQL database technologies have gained more and more in popularity and the increasing ease of use and low cost have attributed to this rise. With that, the relational database management system (RDBMS) is certainly not going anywhere anytime soon. There is, and always will be, a market for both NoSQL and RDBMSs. A key differentiator between the two is that NoSQL tables are typically designed for specific, well-known, and defined access patterns and online transaction processing (OLTP) traffic, whereas an RDBMS allows for ad-hoc queries and/or online analytical processing (OLAP) traffic.

Let’s take a moment to step back. Relational databases have a rich history spanning many decades, excelling in their functionality. Whether open source or commercial, RDBMSs are widely used globally, with the choice of vendor often hinging on the application’s complexity. While there are numerous distinctions between relational database offerings and NoSQL alternatives, we’ll focus on one aspect here. In relational database design, a fundamental principle is normalization—a practice that involves organizing data and establishing relationships between tables to enhance data flexibility, ensure protection, and eliminate redundant data. This normalization is expressed in various normal forms (e.g., First Normal Form (1NF) to Fifth Normal Form (5NF)), typically achieving Third Normal Form (3NF) in most cases. In contrast, NoSQL embraces data duplication to facilitate storage and presentation in a denormalized and read-optimized fashion.

So why is normalization so crucial to relational databases? If we look back over the course of the relational history, we can note that storage cost and disk size/availability were significant limiting factors when using a relational database technology. Reducing the duplication of data in the database results in more storage available on the attached disk. More work was offloaded to the CPU to join data together via relationships and help with the overall integrity of the data. With today’s technological advances, storage costs are fundamentally cheaper, and storage space is almost limitless so we can take advantage of these changes to help optimize the load on the CPU (which is attributed to the most expensive part of the database layer) to retrieve our data. While these advances haven’t predominantly changed the way relational databases work, this is a core value proposition of most NoSQL technologies.

Additionally, one of the primary issues that NoSQL database technologies aim to address is that of extreme scale. What does extreme scale mean in this case? Over time and with the evolution of cloud services and the global nature of the internet, traffic to applications continues to grow exponentially, as do the requests to any underlying databases. While maintaining capacity and availability at this scale, databases need to remain highly performant and deliver results in ever-demanding low-latency environments – milliseconds and microseconds are considered normal, with the latter being often sought after. This can be achieved by horizontally scaling the data (and therefore the requests) across multiple storage devices, or nodes—a task that some relational databases can do but aren’t necessarily performant at, at scale. To scale a relational database, you would typically scale vertically by upgrading the hardware on the server itself, although this has its limits.

Including high-level differences between RDBMS and NoSQL databases, the following table can be formulated:

Characteristic	RDBMS	NoSQL
Data Organization	Encourage normalization	Encourage de-normalization
Scalability	Typically scale vertically; has limitations	Typically scale horizontally; can be extremely scalable
Schema Support	Strict schema, enforced on read	Typically schema-less, flexible
Integrity Constraints	Enforced through relationships and constraints	Depends on the database; not enforced for DynamoDB
Storage/Compute Optimized	Optimized for storage	Optimized for compute
Read Consistency Support	Strong consistency by design; eventual consistency on read replicas	Depends on the database; strong and eventual supported by DynamoDB
Atomicity, Consistency, Isolation, and Durability (ACID)	Most RDBMSs are designed for ACID	Depends on the database; limited ACID transactions supported for DynamoDB
Query Language	Typically SQL	Depends on the database; proprietary for DynamoDB
Examples	MySQL, Oracle, and Postgres	DynamoDB, MongoDB, Neo4j, and Redis

Table 1.1 – RDBMS and NoSQL high-level comparison

It is important to realize that NoSQL is a bigger umbrella of different database technologies that differ from each other in terms of the data structures they support, their query languages, and the nature of the stored data itself. NoSQL broadly covers the following categories:

Key-value
Document
Wide column
Graph

DynamoDB is of the key-value kind, as we have established already, but other categories may support vastly different data models and have their own query languages.

Without digging into all of those, let us next understand how DynamoDB came about.

How and where does DynamoDB fit into all of this?

DynamoDB was simply born out of frustration and a need to solve problems that were directly impacting Amazon’s customers. Toward the end of 2004, the Amazon.com retail platform suffered from several outages, attributed to database scalability, availability, and performance challenges with their relational database (1).

A group of databases and distributed systems experts at Amazon wanted something better. A database that could support their business needs for the long term that could scale as they needed, perform consistently, and offer the high availability that the retail platform demanded. A review of their current database usage revealed that ~70% of the operations they performed were of the key-value kind (2), and they were often not using the relational capabilities offered by their current database. Each of these operations only dealt with a single row at any given time.

Moreover, about 20% of the operations returned multiple rows but all from the same table (2). Not utilizing core querying capabilities of relational database systems such as expensive runtime JOINs in the early 2000s was a testament to the engineering efforts at Amazon. Armed with this insight, a team assembled internally and designed a horizontally scalable distributed database that banked on the simple key-value access requirements of typically smaller-sized rows (less than 1 MB), known back then as Dynamo.

The early internal results of Dynamo were extremely promising. Over a year of running and operating their platform backed by this new database system, they were able to handle huge peaks including holiday seasons much more effortlessly. This included a shopping cart service that served tens of millions of requests and resulted in about three million checkouts in a single day. This also involved managing the session state of hundreds of thousands of concurrent active users shopping for themselves and their nearest and dearest ones. Most of this was using the simple key-value access pattern.

This success led the team to go on to write Amazon’s Dynamo whitepaper (3), which was shared at the 2007 ACM Symposium on Operating Systems Principles (SOSP) conference so industry colleagues and peers could benefit from it. This paper, in turn, helped spark the category and development of distributed database technologies, commonly known as NoSQL. It was so influential that it inspired several NoSQL database technologies, most notably Apache Cassandra (4).

As with many services developed internally by Amazon, and with AWS being a continually growing and evolving business, the team soon realized that AWS customers might find Dynamo as useful and supportive as they had. Therefore, they furthered the design requirements to ensure it was easy to manage and operate, which is a key requirement for the mass adoption of almost any technology. In January 2012, the team launched Amazon DynamoDB (5), the cloud-based, fully-managed NoSQL database offering designed from launch to support extreme scale.

In terms of the general key-value database landscape, where does DynamoDB sit? According to the current (at the time of writing) DB-Engines Ranking (6A) list, Amazon’s DynamoDB is the second most popular key-value database behind Redis (7). Over the years, it has steadily and consistently increased in use and popularity and held its place in the chart firmly (6B).

The rest, as they say, is history.

Important note

We’ve only covered a very small fraction of NoSQL’s history in a nutshell. For a greater explanation of the rise and popularity of the NoSQL movement, I recommend reading Getting Started with NoSQL by Gaurav Vaish (8).

We’ve learned about the advantages that NoSQL and DynamoDB offer, but what does that mean in terms of usage, and how are people utilizing DynamoDB’s features? In the next section, we’ll examine some high-profile customer use cases that show just how powerful DynamoDB can be.

Amazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with predictable, scalable performance

What do you get with eBook?

Amazon DynamoDB - The Definitive Guide

Amazon DynamoDB in Action

Reviewing the role of NoSQL and DynamoDB in the current database market

Comparing RDBMS and NoSQL – a high-level overview

How and where does DynamoDB fit into all of this?

Page 1 of 6

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

About the authors

FAQs

Amazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with predictable, scalable performance

What do you get with eBook?

Contact Details

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Contact Details

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

About the authors

FAQs