This recipe assumes that the model in the recipe Managing the appearance of tables and fields has been created.
The reader should recognize that the model is designed to show sales information by product, date, and customer. This type of modeling scenario is commonly referred to as a star schema and is shown in the following diagram. The Sales
table is referred to as a fact table (since it stores the data facts that we wish to analyze—sales amount, tax amount, and so on) and the other tables are referred to as dimension (subject) tables because they hold descriptive information.
Extending the model further, the Products
table is linked to the Subcategory
table, and the Subcategory
table is linked to the Category
table. This is shown in the following diagram and is sometimes called a snowflake schema, since the dimension tables are not directly connected to the fact table:
An important point to note, is that each dimension table has a unique identifying field, for example, a product can be uniquely identified in the Products
table through the product_id
field. This is commonly referred to as the primary key for the table.
In contrast, the referring column (product_id
in the Sales
table) can have many occurrences of the product_id
field and is commonly referred to as the foreign key.
The model has been extended to show two things. Firstly, by defining relationships between tables within the model, we have defined the filtering path for the data. This path is used to restrict rows between tables that have a relationship. Secondly, by adding a calculation (Sum of total_price and Count of Product ID), we have created measures that apply an aggregation function to the model fields. These are special types of measures within PowerPivot and are referred to as implicit measures (because the model implicitly defines a calculation for the field).
Relationships define how one table relates to another. In order to define a relationship, the join must occur on a field that has unique values in one of the tables (this is commonly called a primary key). The table that has the field with unique values is commonly called the related table. This can be seen in the diagram view, as shown in the following screenshot with the direction of the arrows on the relationships. Consider the Products
table (which has a unique field product_id
) that is related to the Sales
table (through the product_id
field in that table), but only the Products
table needs to have a unique product_id
. It is also said that the product_id
field relates to many records in the Sales
table. This can be seen by the direction of the arrow between Sales
and Products
, the related table has the arrow pointing towards it.
Relationships are important because they define how data is filtered and calculated when it is presented to the user.
Relationships are the primary mechanisms with the model that are used to filter data and perform calculations. That is, the relationship defines how data is filtered when values are shown to the user. Although this is a new concept, the concept of relationships is important because they have important implications with the way that the model determines what data to show to the user. Consider the pivot table shown in the following screenshot—Subcategory
on rows and Sum of total_price
, Count of Product ID
, and Count of category_id
as measures:
Now, consider the relationship defined in the model. This is summarized in the following screenshot:
The rows in the pivot show the subcategory which defines a filter for each row (that is a filter for each subcategory). This filter can then be applied to the Products
table, which in turn is applied to the Sales
table. It might be better to say that the rows of the Sales
table are filtered by the Products
table and then those rows are filtered by the Subcategory
table. This is why the calculations Sum of total_price
and Count of Product ID
show the correct values. The filter on rows of the Sales
table and rows of the Products
table can be applied in the direction of the arrows of the relationships.
However, this is not the case when Subcategory
is shown with data from the Category
table—a filter will only be applied in the direction that a relationship is created. This is why the calculation Count of category_id
shows the same number for each subcategory. With the subcategory on rows, a filter is created which can filter the Products
table but this filter cannot then applied in an upstream manner to the Category
table.
The application of filters may seem unintuitive at first, especially with a relationship design such as the one among Products
, Category
, and Subcategory
, but in reality the model should be designed so that the filters can be applied in a single direction. There is also, the question of unmatched values between fields used in the relationship and how they are treated by the model. For example, what would happen if we had a product_id
field in the Sales
table that did not have a match in the Products
table? Would this even be allowed in the model? The tabular model handles this situation very elegantly. The model allows this situation (without error), and unmatched values are assigned to a blank placeholder. For example, if there was a product in the Sales
table and no product in the Products
table, it would be shown as blank when Products
, Category
, or Subcategory
is used in the pivot.
We have also indicated that the model automatically created implicit measures. The term measure is common in business intelligence tools to specify that a calculated value is returned. Often, this can be a simple calculation, for example, the count of rows or the sum of a field. The important thing to remember is that measure is a single value that is returned from the model (when the model is filtered). Usually, measures are defined by a model designer, but they need not be. This is the case with an implicit measure. An implicit measure is defined automatically, depending on the data type of the column that is being used. Numeric columns are automatically summed, whereas text columns are automatically counted.