The previous section describes how modeling varies between operational and data warehouse scenarios. Before exploring the modeling process in detail, it’s helpful to understand the look and feel of relational and transformational modeling and what we’re working toward. Before proceeding, it would help to summarize the main differences between transactional databases and data warehouses. You can see what these are in the following table:
Transactional Database
|
Data Warehouse
|
Supports daily operations
|
Provides operational insight
|
Operates on single records
|
Summarizes many records
|
Accurate as of the present instant
|
Historical snapshots over time
|
Single source of truth (SSOT), non-redundant
|
Redundant to support different analyses
|
Data models defined by business operations
|
Data models generated by business questions
|
Static and structured data model
|
Inherited structure and dynamically transformed data model
|
Single-application data
|
Multiple sources of converging data
|
Figure 1.4 – Common differences between transactional databases and warehouses
Given these differences, the following sections demonstrate what modeling looks like in each system and what it aims to achieve.
What modeling looks like in operational systems
Completely ignoring the modeling workflow that got us here, which will be covered in later chapters, we can observe an example of the type of modeling most commonly seen in transactional systems. The physical diagram in Figure 1.5 serves both as a blueprint for declaring the required tables and a guide to understanding their business context.
Following modeling conventions (don’t worry if they are still unfamiliar—they will be covered thoroughly in the coming chapters), we can infer a lot of information from this simple diagram. For example, a person is uniquely identified by an eight-digit identifier (the primary key) and must have a Social Security number (SSN), driver’s license, name, and birth date.
The one-to-many relationship between the two tables establishes that while a person does not necessarily need to have an account created, an account must belong to just one person:
Figure 1.5 – A physical model using crow’s foot notation
These details, combined with the list of attributes, data types, and constraints, not only dictate what kinds of data can be written to these tables but also provide an idea of how the business operates. So, how does this differ in analytical databases?
What modeling looks like in analytical systems
In a data warehouse scenario, the PERSON
and ACCOUNT
tables would not be defined from scratch—they would be extracted from the source in which they exist and loaded—bringing both structure and data into the process. Then, the analytical transformations begin in answer to the organization’s business questions. This is a process known as Extract Transform Load (ETL). (Although ELT has become the preferred processing order, the original term stuck.)
Suppose the management team wanted to analyze which age groups (by decade) were opening which account types and they wanted to store the result in a separate table for an independent analysis.
The following diagram shows the resulting relational model of an object obtained through transformational analysis but provides no business context:
Figure 1.6 – A relational model of a transformational requirement
Although physical modeling could describe such a table (as seen in Figure 1.6)—containing the account type with age and count of accounts as integers—such a model would fail to communicate the most relevant details, presented here:
- The logic used to perform the analysis
- The relationship between the source tables and the output
The business requirement for ACCOUNT_TYPE_AGE_ANALYSIS
in this example purposely excludes the source key fields from the target table, preventing the possibility of establishing any relational links. However, the relational model still serves a vital role: it tells us how the sources are related and how to join them correctly to produce the required analysis.
The logic could then be constructed by joining PERSON
and ACCOUNT
, as shown here:
CREATE TABLE account_types_age_analysis AS
SELECT
a.account_type,
ROUND(DATEDIFF(years, p.birth_date, CURRENT_DATE()), -1
) AS age_decade,
COUNT(a.account_id) AS total_accounts
FROM account AS a
INNER JOIN person AS p
ON a.person_id = p.person_id
GROUP BY 1, 2;
Although there is no relational connection between ACCOUNT_TYPE_AGE_ANALYSIS
and its sources, there is still a clear dependency on them and their columns. Instead of using ERDs, which convey entities and relationships, transformational pipelines are visualized through a lineage diagram. This type of diagram gives a column-level mapping from source to target, including all intermediate steps, as shown here:
Figure 1.7 – Transformational modeling seen visually
Paired with the SQL logic used to construct it, the lineage graph gives a complete picture of the transformational relationship between sources and targets in an analytical/warehousing scenario.
Having witnessed both relational and analytical approaches to modeling, it is clear that both play a vital role in navigating the complex dynamic environments that one is liable to encounter in an enterprise-scale Snowflake environment.
Although we have only skimmed the surface of what modeling entails and the unique features of the Snowflake platform that can be leveraged to this end, this chapter has hopefully given you an idea of the vital role that modeling plays in building, maintaining, and documenting database systems. Before diving into the specifics of verbal, technical, and visual modeling semantics of modeling in the chapters to come, let’s review what we learned.