A sample scenario for choosing the Snowflake schema
Here's an example of a design decision process that would lead you to a Snowflake dimension. Start by assuming that all the dimensions in the Data Mart (versus the Data Warehouse, where we may have different ideas) will be modeled as Stars.
We start in our first design with a single dimension, Geography, containing the following columns:
skGeography
(surrogate key)PostalCode
(business key)CityID
CityName
StateID
StateName
CountryID
CountryName
We have one fact source table containing, say, population data with the following columns:
CensusDate
PostalCode
PopulationCount
In ETL, we would join this source table to the dimension table on the business key PostalCode
to retrieve the surrogate key and use this to load the data mart fact table:
CensusDate
skGeography
PopulationCount
Now, let's introduce a second fact source table containing projected population data, but with a different grain. Let's assume this data comes in, not at the Postal Code grain but rather at the State grain. We'd have a source table with columns such as follows:
ProjectionDate
StateID
ProjectedGrowth
We can't join this new source table to our existing Geography dimension because if we do so, we will get back many surrogate keys—each representing one postal code within the specified state. So, we need to Snowflake (partially normalize) the Geography dimension so that it will support the grain of each of our fact source tables, giving us two dimension tables similar to the the following two bullet lists:
dimGeography
:
skGeography
PostalCode
CityID
CityName
skGeographyState
and dimGeographyState
:
skGeographyState
StateID
StateName
CountryID
CountryName
Notice that we did not fully normalize the dimension (postal code and city both exist in the first table, state and country in the second). We just normalized the dimension enough to give us a single relationship between each of our two facts and this dimension.