Data model
The following table makes understanding the data model easier. In RDBMS, a table is organized into rows and columns, but in DynamoDB we will never use these two words (except in this paragraph). Even if it is used mistakenly, please understand that rows are called items and columns are called attributes in DynamoDB, as shown in the following table:
Having said that, let's go and look at realizing a table in DynamoDB. Throughout this book, we are going to use a common illustration. The common illustration is that of a library catalogue, and we are going to discuss examples related to it. Let's take a look at the library catalogue table:
Tip
If you wish to know how to create a table with the attributes mentioned in Table 1.1, read the DynamoDB data types section first. During the creation of a DynamoDB table, it is only possible to specify secondary index attributes, and hash and range key attributes. It is not possible to specify other attributes (previously mentioned as optional attributes) during the creation of the table. In fact, except for hash and range key attributes, all other attributes are part of the items (rows); that is the reason why we don't specify these optional attributes while creating the table.
Let's call the table Tbl_Book
. The table has seven attributes. The first two attributes act as a compound primary key. We set the first attribute BookTitle
as the hash key and the second attribute Author
as the range key. Except for the primary key attributes, all other attributes are optional and we need not specify nonprimary key attributes while creating the table.
Therefore, during the creation of the Tbl_Book
table in DynamoDB, we will specify only the BookTitle
and Author
attributes. All other attributes can be specified while inserting an item into this table.
Let's assume that Tbl_Book
has been created in DynamoDB with BookTitle
and Author
attributes as the hash and range key. We will now insert four items into the table as shown:
One quick question: while inserting the first item into the table, do we need to specify the PubDate
attribute as null? The answer is no; every item can have its own attributes, along with mandatory primary key attributes specified during table creation. In fact, if we want to insert a fifth item with a new attribute named CoverPhoto
, we can do it without affecting the previous four items.
Tip
Unlike RDBMS tables, the attributes (that is, what we call columns in RDBMS) of DynamoDB tables are stored in the item itself as a key-value pair. The attribute name becomes the key and the attribute value becomes the value. So every item will have its own attributes. There is a tradeoff here. Fetching a record will not only fetch the attribute value, but also its attribute name. So if you choose very long attribute names, then the efficiency will decrease.
Let's take a look at a few valid table schema that are supported by DynamoDB:
Let's take a look at a few invalid table schema:
The schema for Table 1.7 is invalid, because it doesn't have the hash key attribute that is mandatory to create the table. Table 1.8 is invalid because of the same reason. The schemas for Table 1.9 and Table 1.10 are invalid because the hash and range keys must be either String
, Number
, or Binary
. It cannot be Set
. We will discuss the Set
data type at the end of this chapter.
Once you have had a good look at a valid table schema, you will have the following questions for sure:
- What is the difference between the hash key and the range key?
- What is the difference between
String
data type andStringSet
data type? - Apart from
Set
, is there any other data type that I should know about? - During table creation, what mandatory information should I provide?
Let us discuss the answers to these questions, which will help us understand the DynamoDB data model better. Here comes the answer to the first question. With the hash and range keys, hash and range are two attributes that act like a (compound) primary key. The range key must be accompanied by the hash key, but the hash key can optionally be accompanied by the range key. The hash key is an attribute that every table must have. It is an unordered collection of items; this means that items with the same hash key values will go to the same partition, but there won't be any ordering based on these hash key values, whereas items will always be ordered on range key values (but grouped on hash key values). After applying the previous statements to the already-created table, its order will look as follows:
So there is no guarantee that the table data will be sorted by the hash key (that is BookTitle
), but it will be hashed or grouped based on the hash key attribute value. That is the reason why Item1
and Item4
are placed close together. On the other hand, the records are ordered on the range key (that is, Author
). That is the reason why the book SCJP authored by Kathy is first, followed by the book authored by Khalid. This answers the first question.
An attribute of the type String
can hold only a simple string. For example, in the previous table we have two attributes (Language
and Language2
) to store the edition language of the book. If a book has 10 different language editions, then we would be left with too many attributes in an item (which will reduce fetch efficiency as discussed on the previous page). So a better solution is to change the Language
attribute from a simple String
type to StringSet
as shown in the following table:
The same cannot be done for the Author
attribute. Can you guess why? If not, you can go back and take a look at Table 1.9 and Table 1.10. Can you guess now? It's because neither the hash key nor the range key can be of the Set
type.
At present there are only six data types in DynamoDB, namely String
, Number
, Binary
, StringSet
, NumberSet
, and BinarySet
. We will discuss this at the end of this chapter.
During table creation, there are two scenarios that decide the mandatory parameters needed to create a DynamoDB table.
- Hash primary key: In this scenario we must (and we can only) provide two parameters. The first parameter is the table name, and the second parameter is the name and type of hash key.
- Hash and range primary key: In this scenario, we must (and we can only) provide three parameters. The first parameter is the table name, the second parameter is the name and type of hash key, and the third parameter is the name and type of range key.
There are different interfaces available to interact with DynamoDB. Take a look at Chapter 2, DynamoDB Interfaces, to know more about the interfaces. We are now done with the basics of this chapter.