Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Database Design and Modeling with Google Cloud

You're reading from   Database Design and Modeling with Google Cloud Learn database design and development to take your data to applications, analytics, and AI

Arrow left icon
Product type Paperback
Published in Dec 2023
Publisher Packt
ISBN-13 9781804611456
Length 234 pages
Edition 1st Edition
Concepts
Arrow right icon
Author (1):
Arrow left icon
Abirami Sukumaran Abirami Sukumaran
Author Profile Icon Abirami Sukumaran
Abirami Sukumaran
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Preface 1. Part 1:Database Model: Business and Technical Design Considerations
2. Chapter 1: Data, Databases, and Design FREE CHAPTER 3. Chapter 2: Handling Data on the Cloud 4. Part 2:Structured Data
5. Chapter 3: Database Modeling for Structured Data 6. Chapter 4: Setting Up a Fully Managed RDBMS 7. Chapter 5: Designing an Analytical Data Warehouse 8. Part 3:Semi-Structured, Unstructured Data, and NoSQL Design
9. Chapter 6: Designing for Semi-Structured Data 10. Chapter 7: Unstructured Data Management 11. Part 4:DevOps and Databases
12. Chapter 8: DevOps and Databases 13. Part 5:Data to AI
14. Chapter 9: Data to AI – Modeling Your Databases for Analytics and ML 15. Chapter 10: Looking Ahead – Designing for LLM Applications 16. Index 17. Other Books You May Enjoy

Choosing the right database

Having assessed all these questions and considerations, the logical next step is to choose from/eliminate from the database types out there, predominantly focusing on the structured and less structured types of data:

  • Relational databases:
    • Online transaction processing (OLTP)
    • Online analytical processing (OLAP)
  • NoSQL databases:
    • Document database
    • Key-value database
    • Wide-column database
    • Graph database

Let’s look at these different categories. We will see some examples of some of these categories in the next chapter.

Relational database

The first category we are going to look at is the relational database. There are two broad categories of relational databases: OLTP and OLAP systems.

Online transaction processing

We have the good old relational database for OLTP, which typically follows these normalization rules:

  1. First normal form: Each column in the table has atomic value, contains no duplicates, and contains a primary key, which is one or more ordered columns that uniquely identify a row.
  2. Second normal form: A relation is in the second normal form if the first normal form is satisfied and a separate table is used to store all the values that apply to multiple rows and linked using foreign keys. A foreign key is one or more ordered columns that relate to a primary key in another table.
  3. Third normal form: A relation is in the third normal form if the second normal form is satisfied and if the column that is transitively dependent on the primary key should be eliminated and moved to another table, along with the determinant. So, if attribute A is the primary key, A is dependent on B, and B is dependent on C, then A to C form transitive dependency. So, the dependency between B and C should be moved to a different table.

Transactional structured data is usually operated one row at a time. For example, consider an application where you are looking up employee information. In this case, you will need a lot of attributes (columns) from a single row. So, it is efficient to store all row attributes together and retrieve them in a block. This is what is called row-oriented storage.

Online analytical processing

OLAP is typically used for data mart and data warehouse applications. This type requires Structured Query Language (SQL) to define, manipulate, query, and manage. They usually facilitate the following:

  • Aggregations and roll-ups
  • Drill down
  • Pivots
  • Slicing and dicing

For example, imagine a scenario where your business or data analyst needs to retrieve reporting summaries such as total employees joined in the last month and total cost to the company incurred. In this case, you will only query three columns. So, for all the rows selected, instead of retrieving all the columns, it makes sense to only retrieve three. This is a common pattern in analytical applications, and they use a column-oriented storage mechanism.

NoSQL database

There are NoSQL databases for semi-structured data – that is, data less structured than fully structured tabular data. There are a few types of NoSQL databases. There is no formal model or normalization requirement for this type.

Sometimes, it’s misleading when we hear that NoSQL options are schema-less. NoSQL options may not have a schema in the same pattern as relational databases, but they do have a structure to represent data. Four broad types of NoSQL databases are loosely based on a specific way of storing data. Let’s look at the logic for a data model in each case.

Document database

A document database stores data in document format, similar to a JSON document. Each document stores pairs of fields and values, with a wide variety of data types and data structures used as values. Document databases allow you to keep adding attributes as needed without making changes to the database schema each time.

For example, consider the scenario where you cannot fix all the attributes at the design stage and have to add more attributes as the process evolves, such as in the case of a retail store selling electronic equipment where a laptop has memory, a processor, a charger, and other configuration attributes and a washing machine has another set of attributes, such as weight, power, length, width, and so on. To search efficiently by these attributes, document databases allow for indexes. If you want to search an equipment by length, then you can create an index on length.

Key-value database

The key-value model consists of a key and a value, and it is the simplest type of database in terms of understanding and usage. The data model has two parts: data and a string associated with the data. Data can be accessed via direct request (send the key and get the data) rather than using any kind of query language. Key-value databases are simple if you have to search only on your key. However, if your value-data structure is complex, for example, a JSON object is stored as a value, and if you want to be able to search on items within the JSON structure, then a key-value database is not the best option. In that case, a document database would be a better option.

The following table is an example of a key-value database:

Key

Value

Key1

Value1

Key2

Value2

Key3

Value1

Key4

Value3

Table 1.1 – Key-value example table

Let’s discuss key-value databases next.

Wide-column database

Wide-column databases follow a table form structure that is both flexible and scalable. Each row has a key and one or more associated columns, called column families. Each row’s key-column family is allowed to have as many numbers of columns and the columns can have as many kinds of data as possible. Data is accessed using a query language. This type of column structure allows for faster summary (aggregation) queries. Wide-column databases are designed in such a way that they respond to specific queries. Instead of using indexes to allow efficient lookup of rows, wide-column databases store data in such a way that rows with similar keys (row keys are like primary keys in relational databases) are close together.

For example, consider log data from an application. The following table represents data that is organized to answer queries looked up by event ID and then time:

Event ID

Time

Description

1

5431023867

Event successful

2

5431023869

Error looking up…

3

5431023868

Finished with error

Table 1.2 – Table organized by Event ID and Time

The preceding table is not applicable for querying events in the past hour because it is not organized by time first. The following table, on the other hand, is organized to answer queries by time range. Note that a new table with the desired schema needs to be created to accomplish this. Wide-column databases do not use indexes for queries:

Time

Event ID

Description

5431023867

1

Event successful

5431023868

3

Finished with error

5431023869

2

Error looking up…

Table 1.3 – Table organized by Time and Event ID

Let’s move on to graph database next.

Graph database

Graph databases consist of nodes and edges. Nodes are connected by edges. Data is stored in the nodes and information about how the nodes are related is stored in the edges. Node and edge data is typically retrieved using query languages, Graph Query Language (GQL), and sometimes SQL as well. There are two types of query methods we can use to retrieve data from graph databases:

  • Cypher Query Language, which specifies SQL-like statements that describe patterns in a graph
  • Traversal Language, which specifies how to navigate from one node to another in the graph

People and connections are a great example of a use case for graph databases. A person could be designed as the node and the relationship of that person with other persons could be the links, also known as edges.

In the following diagram, persons 1, 2, 3, 4, 5, and 6 are all people and the link between them denotes friends:

Figure 1.2 – Representation of nodes and edges in a graph database

Figure 1.2 – Representation of nodes and edges in a graph database

We will look into some of these categories in detail later in this book. For now, let’s summarize what we’ve learned in this chapter.

You have been reading a chapter from
Database Design and Modeling with Google Cloud
Published in: Dec 2023
Publisher: Packt
ISBN-13: 9781804611456
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime