Non-relational databases
After the break, Berta wants to finish her explanation with non-relational databases. As there are several types that she wants to talk about, she starts as soon as possible.
Berta: AWS also offers different non-relational databases, and there are many of them. Each one of them is suited to a specific use case, for example, graph databases.
Alex: These are for graphics?
Berta: No, graph, not graphics. You can store different entities and the relationships between them, building a complex graph. They can be used in recommendation engines, fraud detection, identity graphs, and similar applications. For that purpose, Amazon Neptune would be the right choice.
You can also have databases in RAM, for extreme performance, or databases that are time-based, where you analyze time series, such as for the Internet of Things, stock prices, measurements such as temperature or pressure, or anything where the time sequence is important. You can use ElastiCache as a memory cache, and TimeStream for time-series storage and analysis.
Harold: I think I have read something about an immutable database?
Berta: Yes, that is QLDB, Amazon Quantum Ledger Database. It is a database where the change log cannot be modified or tampered with. It’s very useful for legal proof, or just to maintain the sequence of changes as a fully historical log of all activity that can be verified.
Alex: This is great. I like the idea of having purpose-built databases, rather than trying to use only one type of database. But these databases seem too specialized. Is there any generic-purpose database that is also non-relational?
Berta: Sure. There is one is called Amazon DynamoDB. It is a non-relational database, supporting both key-value and document data models. Being non-relational, nobody enforces a fixed schema. So, each row in DynamoDB—called an Item—can have any number of columns at any moment. That means your tables can adapt to your changing business requirements, without having to stop the database to modify the previous schema:
Figure 6.15 — Key-value and document model
Harold: So, if the data is stored as tables, how is it different from relational databases?
Berta: The key here is that there’s only one table, without any relationships. If your application requires them, you can surely create multiple tables, but they will be completely independent, in separate databases. The application will have to perform the joining logic. You usually have to design the table for all the queries you might anticipate, including all the data and possible indexes.
Also, traditional relational databases have separate endpoints for control operations to create and manage the tables, and a separate endpoint for data operations to create, read, update, and delete (also called CRUD) actions on the data in a table. DynamoDB simplifies this: it offers a single endpoint that can accept all types of requests. Amazon DynamoDB is serverless; so, you don’t worry about any of the operational overhead. Also, DynamoDB supports eventual and strong consistency.
Harold: Could you please explain it?
Berta: Sure. A database consistency model defines the mode and timing in which a successful write or update is reflected in a later read operation of that same value. Let us consider an example to explain it. Do you use a credit or debit card?
Harold: I use both. For cash withdrawals, I use a debit card, and for purchases, a credit card.
Berta: Good. So, if you have withdrawn some money using your debit card and you immediately check your account balance again, will the recent withdrawal reflect in the account statement?
Harold: Yes. It will.
Berta: And if you made a purchase with your credit card, will it also reflect it at the same time?
Harold: I think it shows the transaction in a pending state; it doesn’t show as completed immediately.
Berta: Correct. Credit card processing works slightly differently. The vendor from whom you have purchased the product has to claim a settlement of the transaction. Eventually—by that, I mean after some time—the transaction will show as completed in your account.
DynamoDB always stores multiple copies of your data. Let’s assume it keeps three copies. At any time, one of the copies is chosen as a Leader. Every time a write or update request is initiated, DynamoDB will ensure that at least two copies (the leader and one more copy) are immediately updated to reflect the change. The third copy will have stale data for some time, but finally, it will also be updated to reflect the change.
Harold: But why not update all the copies in the first place?
Berta: The performance impact. If DynamoDB had to wait for all three copies to confirm the write, the application that requested to write would have to wait longer, waiting for the slowest node. Imagine you want to host a meeting with three people in different time zones; you would have to find a common timeslot that suits all three participants. This problem is somewhat simpler when the meeting requires only two people to attend.
Harold: Oh, I get it now. It’s another quorum algorithm, similar to the one used in Aurora. A majority of storage nodes take the decision. So, the third copy is still waiting to be updated, but the acknowledgment of the write has already been sent to the application. This means my data in different copies is not consistent, but at least two copies will have the latest data.
Berta: Yes, for some time. This is sometimes referred to as data being in a soft state. But there are options available in DynamoDB if you want to always read the most up-to-date data.
Alex: But why would someone be interested in reading stale data in the first place?
Berta: For better performance. Let me give you an example. Let’s say you have stored stock prices for a company in DynamoDB. Consider the price of the stock to be $100, and all three copies currently have the same value of $100. Now, you read this data in two different applications. Application one, which is a news scroll, displays the current price of the stock, and application two, is a financial application with which you can buy or sell stocks.
Alex: Okay.
Berta: If you have a news scroll application, you could add a disclaimer such as This data is delayed by 15 minutes and display the data from DynamoDB. In this case, accuracy is not that important, as you are okay with having data delayed by 15 minutes. DynamoDB will never supply you with wrong data or random data, but it might give you data that is stale. It was accurate a while ago, but currently, it may or may not be accurate. As there are three copies, and your read request can land on any copy, there is a one-in-three chance that you may get stale data. But this method will always return the data with the lowest latency. This is called eventual consistency.
Alex: Agreed – if you don’t specify the node, your query might end up in any of them.
Berta: Now, if you want to use the same stock data for a financial application – this means your priority is accuracy rather than speed. You always need to get the most up-to-date data for any financial transaction. In DynamoDB, you can indicate—by using a parameter in your query—that the reader needs the most recent, accurate data. This time, DynamoDB will find out which is the leader and will deliver data from it; this way, you’ll get the most up-to-date data. This is called strong consistency.
Alex: That’s nice. Based on your read requirement you can choose to have eventually consistent data or strongly consistent data. I like the flexibility it offers.
Berta: Eventual consistency is the default mechanism. If a requester doesn’t specify any parameter and just issues a request to read, DynamoDB interprets it as an eventual read request.
Harold: So, all write requests are always consistent; it’s when you read that you select eventual consistency or strong consistency.
Berta: That’s correct.
Charles: In the traditional world, the performance of a database is based on the server it is running. How is the performance of DynamoDB controlled?
Berta: In the case of DynamoDB, you have to configure Read Capacity Units (RCUs) and Write Capacity Units (WCUs) to achieve specific performance. These are table-level settings. An RCU defines the number of strongly consistent reads per second of items up to 4 KB in size. Eventually consistent reads use half the provisioned read capacity. So, if you configured your table for 10 RCU, you could perform 10 strongly consistent read operations, or 20 eventual read operations (double the amount of strongly consistent reads) of 4 KB each. A WCU is the number of 1 KB writes per second.
Charles: Okay. What I don’t understand exactly is how many RCUs or WCUs are required for a new application, or for an application that has spiky or unpredictable access?
Berta: Amazon DynamoDB has got you covered. It has two capacity modes for processing, on-demand and provisioned. In the on-demand mode, DynamoDB instantly accommodates your workloads as they ramp up or down. So, if you have a new table with an unknown workload, an application with unpredictable traffic, or you want to pay only for what you actually use, on-demand mode is a great option. If you choose provisioned mode, you have to specify the number of reads and writes per second needed for your application. So, if your application has predictable and consistent traffic, you want to control costs, and only pay a specific amount, the provisioned mode is better.
Charles: And this mode has to be selected at table creation time or can it be modified later?
Berta: You can set the read/write capacity mode at table creation, or you can change it later too, either manually or programmatically.
Harold: By the way, you mentioned some non-relational databases also support querying through SQL?
Berta: Yes. DynamoDB supports PartiQL, an open source, SQL-compatible query language. Furthermore, you can use a client-side GUI tool called NoSQL Workbench for Amazon DynamoDB, which provides data modeling, data visualization, and query development features for DynamoDB tables.
I think we now have enough information to map our existing databases to AWS services. Let’s start listing all the databases that we plan to migrate to AWS and work as a team to identify possible migration methods.