Efficient use of primary keys
As DynamoDB is a NoSQL database and is used with scalable applications, table data might grow exponentially. This might reduce data read and write throughput (the number of 1 KB read or write requests per second) if not managed efficiently. This management starts right from choosing the correct primary key and its parameters. Take a look at the following table:
As soon as the table is created, the table data is partitioned on the hash key attribute. What this means is that if the table has three partitions, then the first two items will go to the first partition, the third item will go to the second partition and the last item will go to the third partition. This partition is based purely on hash logic, which we are not going to discuss here.
In our library catalogue example, we are always looking for a certain book, with the assumption that the first thing that comes to our mind when identifying a book is its title. That is why we decided to set the BookTitle
attribute as the hash key. Another reason why we chose this specific attribute as the hash key is the assumption that most of the scan operations for the table will include the BookTitle
attribute.
DynamoDB does not allow duplication of the hash key (provided that the table does not have a range attribute), so if the primary key is a simple hash key, then we are enforcing that an entry cannot be made into the previous table with the same book title. But in a real-world scenario this is not the case. So we are in need of a range key attribute as well. The next decision to be taken is what should be made the range attribute. We will assume that the second attribute that comes to mind when identifying a book is the name of its author. Unlike the hash key attribute, range key attributes are ordered (also grouped on the hash key attribute). Here also we are enforcing upon DynamoDB that the same author will never write a book on the same title.
Take a look at the following table (which is incorrect and is shown only to understand the concept):
But this might fail in several cases because the later editions of the book might have been authored by the same author. In this case, the second item insertion will simply overwrite the first item because the primary key is duplicated. As a solution, at this point in time I'd recommend you to concatenate the Author
attribute along with the Edition
attribute separated by #
(or any other acceptable delimiter). So the table will look as follows:
Observe the String
range key attribute Author#Edition
. Even if some of the items don't have the edition included in the range key attribute, it will not create any trouble at the DynamoDB end (but we have to take care from the application programming front).
Some of you might have thought of making the range key attribute type as StringSet
, but remember that hash or range key attributes cannot be a Set
type.
There are a few things to be kept in mind before choosing the correct hash and range attributes:
- Since the table is partitioned based on the hash key attribute, do not choose repeating attributes that will have only single-digit (very few) unique values. For example, the
Language
attribute of our table has only three identical values. Choosing this attribute will eat up a lot of throughput. - Give the most restricted data type. For example, if we decide to make some number attributes as primary key attributes, then (even though
String
can also store numbers) we must use theNumber
data type only, because the hash and ordering logic will differ for each data type. Other advantages will be discussed in Chapter 5, Query and Scan Operations in DynamoDB, while discussing query and scan. - Do not put too many attributes or too lengthy attributes (using delimiter as discussed formerly) into the primary key attributes, because it becomes mandatory that every item must have these attributes and all the attributes will become part of the query operation, which is inefficient.
- Make the attribute to be ordered as the range key attribute.
- Make the attribute to be grouped (or partitioned) as the hash key attribute.