Amazon DynamoDB - The Definitive Guide

Amazon DynamoDB in Action

Amazon DynamoDB is a fully-managed NoSQL database service from Amazon Web Services (AWS). It offers single-digit millisecond performance and virtually unlimited data storage, and it can easily be tuned to support any throughput required for your workload, at any scale. Since its launch in January 2012, DynamoDB has come a remarkably long way and consistently evolves to support additional features, enabling more businesses and enterprises to leverage the scale of this NoSQL offering.

Welcome to the first chapter of Amazon DynamoDB – The Definitive Guide, where Mike and I, Aman, dive into all things NoSQL, focusing on DynamoDB. We’ll explore its backstory, how it can supercharge your applications, and how it frees you from the hassles of managing clusters and instances. Learning NoSQL and DynamoDB isn’t too tough, but our aim with this book is to share everything we’ve learned to make your journey smoother. That’s why we’ve titled it The Definitive Guide – it’s your go-to resource for mastering DynamoDB.

The book is for those of you who are in roles such as software architects, developers, engineering managers, data engineers, product owners, or traditional database administrators. The book itself will provide you with all the skills and knowledge needed to make the most of DynamoDB for your application. This means benefiting from predictable performance at scale, while also being highly available, fault-tolerant, and requiring minimal ongoing management. I could continue listing the service’s benefits and why you should choose it for your applications, but it might be more valuable if you read through examples and success stories of other customers later in this chapter. Before we get started, let’s learn about Mike’s relationship with DynamoDB, in his own words.

My (Mike) journey into the world of DynamoDB has been a tale of two paths. The first path was to overall enlightenment, which was not as simple as I thought it might be – and that is what in part has led to this book. I wanted to share my knowledge, the way I approached data modeling, and how I had to unlearn some core concepts from the relational database world that then gave me that lightbulb moment when working with DynamoDB.

The second path was knowing where DynamoDB fits in my toolset, how to take advantage of the astonishing number of features that come baked into the service, and lastly, knowing when not to try and make workloads fit when there are more practical technologies out there. While DynamoDB can work for the majority of your database workloads, we need to be realistic and understand what it is not suited for, and that is where being armed with the right knowledge lets us make better, informed decisions about our database choices.

In this chapter, we will dive into DynamoDB’s position in today’s database market and trace its evolution in time. Subsequently, we will explore real-world use cases showcasing the impressive scale and volume regularly achieved by DynamoDB, demonstrating its battle-hardened capabilities beyond routine customer traffic.

We will then evaluate which workloads DynamoDB is suited for and how it can easily become one of the most used tools in your set of database technologies once you qualify DynamoDB as the right fit for the use case. There are many additional database technologies that do their own job extremely well, so knowing when to offload duties to the right technology is key in our decision making.

By the end of this chapter, you will have enough background knowledge and plenty of excitement to start setting up a working environment and looking at some useful tools we can leverage in preparation for getting hands-on with DynamoDB.

In this chapter, we will cover the following topics:

NoSQL and DynamoDB in the current database market
DynamoDB case studies
Workloads not suited for DynamoDB

Reviewing the role of NoSQL and DynamoDB in the current database market

Let’s take a look at how DynamoDB came to be – where it sits in the current NoSQL market, and how NoSQL compares to relational databases. Here, we cover a couple of core differences, both functional and technical, and learn of the very beginnings of the DynamoDB service.

My IT career started more than 20 years ago, in the early 2000s, when relational databases were the default choice of any application that required a data persistence layer. I had built up a fairly solid knowledge of leveraging MySQL in my builds and applications, as was the de facto for many similar developers and open source enthusiasts at the time. Like many before (and after) me, I had been at the mercy of maxing out CPU and connection limits of single-instance database servers, having headaches over performance, trying to manage uptime and reliability, and, of course, ensuring everything stays as secure as possible.

I remember sitting in the keynote session at the AWS London Summit at ExCeL, an exhibition center. An AWS customer was on stage talking about their use of DynamoDB. They were in the ad-space sector and bidding on real-time advertising impressions. Speed was crucial to their success (as is the case for many other industry verticals), and their database technology of choice was DynamoDB. I recall a slide where they discussed that their peak request per second (RPS) rate was frequently hitting 1 million RPS and that DynamoDB was easily handling this (and more when having to burst above that rate). I sat there thinking, “If there’s a database technology that lets me do 1M RPS without breaking a sweat or spending hours (if not days) having to configure it for that level, I want to be a part of it.” It was from there that I became engrossed in DynamoDB, and quite frankly, I have not looked back since.

Comparing RDBMS and NoSQL – a high-level overview

Over the past few years, NoSQL database technologies have gained more and more in popularity and the increasing ease of use and low cost have attributed to this rise. With that, the relational database management system (RDBMS) is certainly not going anywhere anytime soon. There is, and always will be, a market for both NoSQL and RDBMSs. A key differentiator between the two is that NoSQL tables are typically designed for specific, well-known, and defined access patterns and online transaction processing (OLTP) traffic, whereas an RDBMS allows for ad-hoc queries and/or online analytical processing (OLAP) traffic.

Let’s take a moment to step back. Relational databases have a rich history spanning many decades, excelling in their functionality. Whether open source or commercial, RDBMSs are widely used globally, with the choice of vendor often hinging on the application’s complexity. While there are numerous distinctions between relational database offerings and NoSQL alternatives, we’ll focus on one aspect here. In relational database design, a fundamental principle is normalization—a practice that involves organizing data and establishing relationships between tables to enhance data flexibility, ensure protection, and eliminate redundant data. This normalization is expressed in various normal forms (e.g., First Normal Form (1NF) to Fifth Normal Form (5NF)), typically achieving Third Normal Form (3NF) in most cases. In contrast, NoSQL embraces data duplication to facilitate storage and presentation in a denormalized and read-optimized fashion.

So why is normalization so crucial to relational databases? If we look back over the course of the relational history, we can note that storage cost and disk size/availability were significant limiting factors when using a relational database technology. Reducing the duplication of data in the database results in more storage available on the attached disk. More work was offloaded to the CPU to join data together via relationships and help with the overall integrity of the data. With today’s technological advances, storage costs are fundamentally cheaper, and storage space is almost limitless so we can take advantage of these changes to help optimize the load on the CPU (which is attributed to the most expensive part of the database layer) to retrieve our data. While these advances haven’t predominantly changed the way relational databases work, this is a core value proposition of most NoSQL technologies.

Additionally, one of the primary issues that NoSQL database technologies aim to address is that of extreme scale. What does extreme scale mean in this case? Over time and with the evolution of cloud services and the global nature of the internet, traffic to applications continues to grow exponentially, as do the requests to any underlying databases. While maintaining capacity and availability at this scale, databases need to remain highly performant and deliver results in ever-demanding low-latency environments – milliseconds and microseconds are considered normal, with the latter being often sought after. This can be achieved by horizontally scaling the data (and therefore the requests) across multiple storage devices, or nodes—a task that some relational databases can do but aren’t necessarily performant at, at scale. To scale a relational database, you would typically scale vertically by upgrading the hardware on the server itself, although this has its limits.

Including high-level differences between RDBMS and NoSQL databases, the following table can be formulated:

Characteristic	RDBMS	NoSQL
Data Organization	Encourage normalization	Encourage de-normalization
Scalability	Typically scale vertically; has limitations	Typically scale horizontally; can be extremely scalable
Schema Support	Strict schema, enforced on read	Typically schema-less, flexible
Integrity Constraints	Enforced through relationships and constraints	Depends on the database; not enforced for DynamoDB
Storage/Compute Optimized	Optimized for storage	Optimized for compute
Read Consistency Support	Strong consistency by design; eventual consistency on read replicas	Depends on the database; strong and eventual supported by DynamoDB
Atomicity, Consistency, Isolation, and Durability (ACID)	Most RDBMSs are designed for ACID	Depends on the database; limited ACID transactions supported for DynamoDB
Query Language	Typically SQL	Depends on the database; proprietary for DynamoDB
Examples	MySQL, Oracle, and Postgres	DynamoDB, MongoDB, Neo4j, and Redis

Table 1.1 – RDBMS and NoSQL high-level comparison

It is important to realize that NoSQL is a bigger umbrella of different database technologies that differ from each other in terms of the data structures they support, their query languages, and the nature of the stored data itself. NoSQL broadly covers the following categories:

Key-value
Document
Wide column
Graph

DynamoDB is of the key-value kind, as we have established already, but other categories may support vastly different data models and have their own query languages.

Without digging into all of those, let us next understand how DynamoDB came about.

How and where does DynamoDB fit into all of this?

DynamoDB was simply born out of frustration and a need to solve problems that were directly impacting Amazon’s customers. Toward the end of 2004, the Amazon.com retail platform suffered from several outages, attributed to database scalability, availability, and performance challenges with their relational database (1).

A group of databases and distributed systems experts at Amazon wanted something better. A database that could support their business needs for the long term that could scale as they needed, perform consistently, and offer the high availability that the retail platform demanded. A review of their current database usage revealed that ~70% of the operations they performed were of the key-value kind (2), and they were often not using the relational capabilities offered by their current database. Each of these operations only dealt with a single row at any given time.

Moreover, about 20% of the operations returned multiple rows but all from the same table (2). Not utilizing core querying capabilities of relational database systems such as expensive runtime JOINs in the early 2000s was a testament to the engineering efforts at Amazon. Armed with this insight, a team assembled internally and designed a horizontally scalable distributed database that banked on the simple key-value access requirements of typically smaller-sized rows (less than 1 MB), known back then as Dynamo.

The early internal results of Dynamo were extremely promising. Over a year of running and operating their platform backed by this new database system, they were able to handle huge peaks including holiday seasons much more effortlessly. This included a shopping cart service that served tens of millions of requests and resulted in about three million checkouts in a single day. This also involved managing the session state of hundreds of thousands of concurrent active users shopping for themselves and their nearest and dearest ones. Most of this was using the simple key-value access pattern.

This success led the team to go on to write Amazon’s Dynamo whitepaper (3), which was shared at the 2007 ACM Symposium on Operating Systems Principles (SOSP) conference so industry colleagues and peers could benefit from it. This paper, in turn, helped spark the category and development of distributed database technologies, commonly known as NoSQL. It was so influential that it inspired several NoSQL database technologies, most notably Apache Cassandra (4).

As with many services developed internally by Amazon, and with AWS being a continually growing and evolving business, the team soon realized that AWS customers might find Dynamo as useful and supportive as they had. Therefore, they furthered the design requirements to ensure it was easy to manage and operate, which is a key requirement for the mass adoption of almost any technology. In January 2012, the team launched Amazon DynamoDB (5), the cloud-based, fully-managed NoSQL database offering designed from launch to support extreme scale.

In terms of the general key-value database landscape, where does DynamoDB sit? According to the current (at the time of writing) DB-Engines Ranking (6A) list, Amazon’s DynamoDB is the second most popular key-value database behind Redis (7). Over the years, it has steadily and consistently increased in use and popularity and held its place in the chart firmly (6B).

The rest, as they say, is history.

Important note

We’ve only covered a very small fraction of NoSQL’s history in a nutshell. For a greater explanation of the rise and popularity of the NoSQL movement, I recommend reading Getting Started with NoSQL by Gaurav Vaish (8).

We’ve learned about the advantages that NoSQL and DynamoDB offer, but what does that mean in terms of usage, and how are people utilizing DynamoDB’s features? In the next section, we’ll examine some high-profile customer use cases that show just how powerful DynamoDB can be.

DynamoDB case studies

DynamoDB is currently used by more than a million customers globally (9). These range from small applications performing a few requests here and there, to enterprise-grade systems that continually require millions of requests per day, and all these and more depend on DynamoDB to power mission-critical services.

There is an incredibly high chance that you have indirectly been a DynamoDB user if you have spent any time on the internet. From retail to media to motorsports and hospitality, DynamoDB powers a sizable number of online applications. One of the best ways to find out about the incredible workloads and systems that not only DynamoDB but also AWS helps to power is to hear it through the voice of their customers.

What’s key to think about overall is that while these case studies shine a light on some of the large-scale DynamoDB solutions in use today, they may not completely represent what you or I want or need to use DynamoDB for right now. Importantly, the takeaway here is that you and I and every developer, start-up, or enterprise have the same power at our fingertips with DynamoDB as the following case studies. Every feature these customers have access to within the service, you do too, and that’s where the power of not only DynamoDB but also AWS comes in.

The upcoming case studies illustrate how DynamoDB users adopted the service either to overcome specific challenges or to leverage their trust in the platform based on prior positive experiences with the technology.

Amazon retail

As we briefly covered in the previous section, Amazon.com, the retail platform, heavily relied on its existing relational databases but was struggling to scale its system to keep pace with the rapid growth and demand of the platform. The Amazon Herd team decided to move over to DynamoDB (10).

Migrating from a relational database to a NoSQL offering takes time, planning, and understanding of the workload that is going to be migrated. The internal teams (as we’ve highlighted) were able to identify that ~70% of the current database queries were that of the key-value type (the lookup was performed by a single primary key with which a single row was returned), which aligns well with NoSQL.

Once fully implemented, one of the many benefits the Herd team was able to make use of was being able to reduce their planning and the time required to scale the system for large events by 90%. This allowed the team to spend more time in other areas, such as creating new value-adding features for the Amazon retail platform.

On the back of this success, DynamoDB powers many of the mission-critical systems used within Amazon (including multiple high-traffic Amazon properties and systems). To further exhibit just how powerful and instrumental the move to DynamoDB was, and over the period of the multiple-day Amazon Prime Day event in 2023, in an official post by AWS (11), the following is stated:

“DynamoDB powers multiple high-traffic Amazon properties and systems including Alexa, the Amazon.com sites, and all Amazon fulfilment centers. Over the course of Prime Day, these sources made trillions of calls to the DynamoDB API. DynamoDB maintained high availability while delivering single-digit millisecond responses and peaking at 126 million requests per second.”

– Jeff Barr, Chief Evangelist for Amazon Web Services

Let’s just think about that for a second – 126 million requests per second?! For any service to be able to deliver this level of throughput is an incredible feat. What’s more, this was delivered on top of all other AWS customers’ workloads across the globe, without breaking a sweat. Whatever you can throw at DynamoDB, the service can handle it in its stride. To this day (and every year when I read the latest Amazon Prime Day stats), I’m in awe of the tremendous power that DynamoDB offers. This is one of the many reasons why so many customers choose DynamoDB to power their workloads.

What is Amazon Herd?

Amazon Herd is an internal orchestration system that helps to enable over 1,300 workflows within the Amazon retail system. Herd is used to manage services ranging from order processing and certain fulfillment center activities as well as powering parts that help Amazon Alexa to function in the cloud.

Disney+

In April 2021, the Walt Disney Company outlined in a press release (12) how they are using AWS to support their global expansion of Disney+, one of the largest online streaming video services in the world. Disney+ scales part of its feature set globally on DynamoDB.

If, like me, you have used Disney+ and have added a video to your watchlist, or perhaps you have started watching any of their video content, paused it, and have come back later, or even watched the rest of it on a different device, these features have data that persisted under the hood with DynamoDB. Globally, these events (and more) amount to billions of customer actions each day.

Disney+ takes advantage of AWS’ global footprint to deploy its services in multiple regions (13), and with DynamoDB, they are making use of a feature called Global Tables (14) (see Chapter 13, Global Tables). In a nutshell, Global Tables provides a fully-managed solution for multi-region, multi-active tables within DynamoDB. You simply enable the feature and decide which regions you want to be “active” in, and DynamoDB takes care of regional provisioning, any data back-filling, and then all ongoing replication for you. It’s incredibly powerful and requires no maintenance from you. It’s just one of the mechanisms in place that we can refer to when thinking about DynamoDB’s “click-button” scaling.

With Global Tables in place, many of the Disney+ APIs are configured to read and write content within the same region where the data is located, reducing overall latencies to and from the database. Global Tables also allows their services to failover geographically from one entire region to another, from both a technical failure perspective as well as being able to perform maintenance in a given region without incurring user downtime or affecting the availability and usability of the overall Disney+ service.

I highly recommend watching an AWS re:Invent video titled How Disney+ scales globally on Amazon DynamoDB (15). Next, let us review the characteristics of workloads that may not be suited for DynamoDB.

Workloads not suited for DynamoDB

Let’s quickly cover a couple of areas that aren’t as well supported with DynamoDB, such as search and analytics. While not technically impossible to implement, often, the requirements are strong enough to warrant a dedicated technology to which to offload these tasks. Just because we are using a NoSQL database, it doesn’t mean that we should shoehorn every workload into it. As I called out in the opening paragraph, it’s better to know when not to use a technology than forcing it to work and ending up with an overly and unnecessarily complex data model because of it.

While DynamoDB does have a very basic ASCII search operator available, this is only available on a sort key attribute or by using filtering on additional attributes. It can only perform left-to-right or exact string matching and does not offer any level of fuzzy matching or in-depth or regular expression matching. Unless your string matching operates at this level, you will need to utilize a dedicated technology for this. An approach often seen, and well documented, is to support search queries with Amazon OpenSearch Service (16, 17A, 17B), or similar.

Unlike some other NoSQL technologies, such as MongoDB (18), DynamoDB does not have any kind of aggregation framework built into it. This means you cannot easily perform analytical queries across results or datasets within your table. If you need to generate sums of data (for dashboarding, for example), this must be done within your application and the results written back into DynamoDB for later retrieval. Although an advanced design pattern using DynamoDB Streams and AWS Lambda could be leveraged to perform some kinds of aggregations, on a holistic level, the database engine itself does not support native operators for aggregation queries.

There is a misconception with NoSQL that once you have created your model, you cannot change it or work with newer and evolving access patterns – that’s untrue. It’s always worth pointing out that with many production applications, it is common for these access patterns to change organically over time. There could be several reasons for this: a change in business model, or a new team or service is introduced to the company that has an alternative access pattern that needs fulfilling. With DynamoDB, you are not strictly “stuck” with your original data model design. The DynamoDB team has got you covered with support for additional indices on your table by using what’s called a Global Secondary Index (GSI) (19).

We will cover data modeling and design in a lot more depth over the coming chapters, giving plenty of opportunity to explore how we build data models and accommodate changes with GSIs – don’t worry if this does not make sense right now, just know that you can support additional access patterns on the same table, without necessarily having to rebuild and restructure the entire table itself.

Amazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with predictable, scalable performance

What do you get with eBook?

Amazon DynamoDB - The Definitive Guide

Amazon DynamoDB in Action

Reviewing the role of NoSQL and DynamoDB in the current database market

Comparing RDBMS and NoSQL – a high-level overview

How and where does DynamoDB fit into all of this?

DynamoDB case studies

Amazon retail

Disney+

Workloads not suited for DynamoDB

Summary

References

Page 1 of 6

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

About the authors

FAQs

Amazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with predictable, scalable performance

What do you get with eBook?

Contact Details

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Contact Details

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

About the authors

FAQs