Let’s take a look at how DynamoDB came to be – where it sits in the current NoSQL market, and how NoSQL compares to relational databases. Here, we cover a couple of core differences, both functional and technical, and learn of the very beginnings of the DynamoDB service.
My IT career started more than 20 years ago, in the early 2000s, when relational databases were the default choice of any application that required a data persistence layer. I had built up a fairly solid knowledge of leveraging MySQL in my builds and applications, as was the de facto for many similar developers and open source enthusiasts at the time. Like many before (and after) me, I had been at the mercy of maxing out CPU and connection limits of single-instance database servers, having headaches over performance, trying to manage uptime and reliability, and, of course, ensuring everything stays as secure as possible.
I remember sitting in the keynote session at the AWS London Summit at ExCeL, an exhibition center. An AWS customer was on stage talking about their use of DynamoDB. They were in the ad-space sector and bidding on real-time advertising impressions. Speed was crucial to their success (as is the case for many other industry verticals), and their database technology of choice was DynamoDB. I recall a slide where they discussed that their peak request per second (RPS) rate was frequently hitting 1 million RPS and that DynamoDB was easily handling this (and more when having to burst above that rate). I sat there thinking, “If there’s a database technology that lets me do 1M RPS without breaking a sweat or spending hours (if not days) having to configure it for that level, I want to be a part of it.” It was from there that I became engrossed in DynamoDB, and quite frankly, I have not looked back since.
Comparing RDBMS and NoSQL – a high-level overview
Over the past few years, NoSQL database technologies have gained more and more in popularity and the increasing ease of use and low cost have attributed to this rise. With that, the relational database management system (RDBMS) is certainly not going anywhere anytime soon. There is, and always will be, a market for both NoSQL and RDBMSs. A key differentiator between the two is that NoSQL tables are typically designed for specific, well-known, and defined access patterns and online transaction processing (OLTP) traffic, whereas an RDBMS allows for ad-hoc queries and/or online analytical processing (OLAP) traffic.
Let’s take a moment to step back. Relational databases have a rich history spanning many decades, excelling in their functionality. Whether open source or commercial, RDBMSs are widely used globally, with the choice of vendor often hinging on the application’s complexity. While there are numerous distinctions between relational database offerings and NoSQL alternatives, we’ll focus on one aspect here. In relational database design, a fundamental principle is normalization—a practice that involves organizing data and establishing relationships between tables to enhance data flexibility, ensure protection, and eliminate redundant data. This normalization is expressed in various normal forms (e.g., First Normal Form (1NF) to Fifth Normal Form (5NF)), typically achieving Third Normal Form (3NF) in most cases. In contrast, NoSQL embraces data duplication to facilitate storage and presentation in a denormalized and read-optimized fashion.
So why is normalization so crucial to relational databases? If we look back over the course of the relational history, we can note that storage cost and disk size/availability were significant limiting factors when using a relational database technology. Reducing the duplication of data in the database results in more storage available on the attached disk. More work was offloaded to the CPU to join data together via relationships and help with the overall integrity of the data. With today’s technological advances, storage costs are fundamentally cheaper, and storage space is almost limitless so we can take advantage of these changes to help optimize the load on the CPU (which is attributed to the most expensive part of the database layer) to retrieve our data. While these advances haven’t predominantly changed the way relational databases work, this is a core value proposition of most NoSQL technologies.
Additionally, one of the primary issues that NoSQL database technologies aim to address is that of extreme scale. What does extreme scale mean in this case? Over time and with the evolution of cloud services and the global nature of the internet, traffic to applications continues to grow exponentially, as do the requests to any underlying databases. While maintaining capacity and availability at this scale, databases need to remain highly performant and deliver results in ever-demanding low-latency environments – milliseconds and microseconds are considered normal, with the latter being often sought after. This can be achieved by horizontally scaling the data (and therefore the requests) across multiple storage devices, or nodes—a task that some relational databases can do but aren’t necessarily performant at, at scale. To scale a relational database, you would typically scale vertically by upgrading the hardware on the server itself, although this has its limits.
Including high-level differences between RDBMS and NoSQL databases, the following table can be formulated:
Characteristic
|
RDBMS
|
NoSQL
|
Data Organization
|
Encourage normalization
|
Encourage de-normalization
|
Scalability
|
Typically scale vertically; has limitations
|
Typically scale horizontally; can be extremely scalable
|
Schema Support
|
Strict schema, enforced on read
|
Typically schema-less, flexible
|
Integrity Constraints
|
Enforced through relationships and constraints
|
Depends on the database; not enforced for DynamoDB
|
Storage/Compute Optimized
|
Optimized for storage
|
Optimized for compute
|
Read Consistency Support
|
Strong consistency by design; eventual consistency on read replicas
|
Depends on the database; strong and eventual supported by DynamoDB
|
Atomicity, Consistency, Isolation, and Durability (ACID)
|
Most RDBMSs are designed for ACID
|
Depends on the database; limited ACID transactions supported for DynamoDB
|
Query Language
|
Typically SQL
|
Depends on the database; proprietary for DynamoDB
|
Examples
|
MySQL, Oracle, and Postgres
|
DynamoDB, MongoDB, Neo4j, and Redis
|
Table 1.1 – RDBMS and NoSQL high-level comparison
It is important to realize that NoSQL is a bigger umbrella of different database technologies that differ from each other in terms of the data structures they support, their query languages, and the nature of the stored data itself. NoSQL broadly covers the following categories:
- Key-value
- Document
- Wide column
- Graph
DynamoDB is of the key-value kind, as we have established already, but other categories may support vastly different data models and have their own query languages.
Without digging into all of those, let us next understand how DynamoDB came about.
How and where does DynamoDB fit into all of this?
DynamoDB was simply born out of frustration and a need to solve problems that were directly impacting Amazon’s customers. Toward the end of 2004, the Amazon.com retail platform suffered from several outages, attributed to database scalability, availability, and performance challenges with their relational database (1).
A group of databases and distributed systems experts at Amazon wanted something better. A database that could support their business needs for the long term that could scale as they needed, perform consistently, and offer the high availability that the retail platform demanded. A review of their current database usage revealed that ~70% of the operations they performed were of the key-value kind (2), and they were often not using the relational capabilities offered by their current database. Each of these operations only dealt with a single row at any given time.
Moreover, about 20% of the operations returned multiple rows but all from the same table (2). Not utilizing core querying capabilities of relational database systems such as expensive runtime JOINs in the early 2000s was a testament to the engineering efforts at Amazon. Armed with this insight, a team assembled internally and designed a horizontally scalable distributed database that banked on the simple key-value access requirements of typically smaller-sized rows (less than 1 MB), known back then as Dynamo.
The early internal results of Dynamo were extremely promising. Over a year of running and operating their platform backed by this new database system, they were able to handle huge peaks including holiday seasons much more effortlessly. This included a shopping cart service that served tens of millions of requests and resulted in about three million checkouts in a single day. This also involved managing the session state of hundreds of thousands of concurrent active users shopping for themselves and their nearest and dearest ones. Most of this was using the simple key-value access pattern.
This success led the team to go on to write Amazon’s Dynamo whitepaper (3), which was shared at the 2007 ACM Symposium on Operating Systems Principles (SOSP) conference so industry colleagues and peers could benefit from it. This paper, in turn, helped spark the category and development of distributed database technologies, commonly known as NoSQL. It was so influential that it inspired several NoSQL database technologies, most notably Apache Cassandra (4).
As with many services developed internally by Amazon, and with AWS being a continually growing and evolving business, the team soon realized that AWS customers might find Dynamo as useful and supportive as they had. Therefore, they furthered the design requirements to ensure it was easy to manage and operate, which is a key requirement for the mass adoption of almost any technology. In January 2012, the team launched Amazon DynamoDB (5), the cloud-based, fully-managed NoSQL database offering designed from launch to support extreme scale.
In terms of the general key-value database landscape, where does DynamoDB sit? According to the current (at the time of writing) DB-Engines Ranking (6A) list, Amazon’s DynamoDB is the second most popular key-value database behind Redis (7). Over the years, it has steadily and consistently increased in use and popularity and held its place in the chart firmly (6B).
The rest, as they say, is history.
Important note
We’ve only covered a very small fraction of NoSQL’s history in a nutshell. For a greater explanation of the rise and popularity of the NoSQL movement, I recommend reading Getting Started with NoSQL by Gaurav Vaish (8).
We’ve learned about the advantages that NoSQL and DynamoDB offer, but what does that mean in terms of usage, and how are people utilizing DynamoDB’s features? In the next section, we’ll examine some high-profile customer use cases that show just how powerful DynamoDB can be.