Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Microsoft Power BI

You're reading from   Mastering Microsoft Power BI Expert techniques for effective data analytics and business intelligence

Arrow left icon
Product type Paperback
Published in Mar 2018
Publisher Packt
ISBN-13 9781788297233
Length 638 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Brett Powell Brett Powell
Author Profile Icon Brett Powell
Brett Powell
Arrow right icon
View More author details
Toc

Table of Contents (15) Chapters Close

Preface 1. Planning Power BI Projects FREE CHAPTER 2. Connecting to Sources and Transforming Data with M 3. Designing Import and DirectQuery Data Models 4. Developing DAX Measures and Security Roles 5. Creating and Formatting Power BI Reports 6. Applying Custom Visuals, Animation, and Analytics 7. Designing Power BI Dashboards and Architectures 8. Managing Application Workspaces and Content 9. Managing the On-Premises Data Gateway 10. Deploying the Power BI Report Server 11. Creating Power BI Apps and Content Distribution 12. Administering Power BI for an Organization 13. Scaling with Premium and Analysis Services 14. Other Books You May Enjoy

Dataset design process

With the data warehouse bus matrix as a guide, the business intelligence team can work with representatives from the relevant business teams and project sponsors to complete the following four-step dataset design process:

  1. Select the business process.
  2. Declare the grain.
  3. Identify the dimensions.
  4. Define the facts.

Selecting the business process

Ultimately each business process will be represented by a fact table with a star schema of many-to-one relationships to dimensions. In a discovery or requirements gathering process it can be difficult to focus on a single business process in isolation as users regularly analyze multiple business processes simultaneously or need to. Nonetheless, it's essential that the dataset being designed reflects low level business activities (for example, receiving an online sales order) rather than a consolidation or integration of distinct business processes such as a table with both online and reseller sales data:

  • Confirm that the answer provided to the first question of the project template regarding data sources is accurate:
    • In this project, the required business processes are Internet Sales, Reseller Sales, Annual Sales and Margin Plan
    • Each of the three business processes corresponds to a fact table to be included in the Power BI dataset
  • Obtain a high-level understanding of the top business questions each business process will answer:
    • For example, "What are total sales relative to the Annual Sales Plan and relative to last year?"
    • In this project, Internet Sales and Reseller Sales will be combined into overall corporate sales and margin KPIs
  • Optionally, reference the data warehouse bus matrix of business processes and their related dimensions:
    • For example, discuss the integration of inventory data and the insights this integration may provide
    • In many projects, a choice or compromise has to be made given the limited availability of certain business processes and the costs or timelines associated with preparing this data for production use:
      • Additionally, business processes (fact tables) are the top drivers of the storage and processing costs of the dataset and thus should only be included if necessary.
A common anti-pattern to avoid in Power BI projects is the development of datasets for specific projects or teams rather than business processes. For example, one dataset would be developed exclusively for the marketing team and another dataset would be created for the sales organization. Assuming both teams require access to the same sales data, this approach naturally leads to a waste of resources, as the same sales data would be queried and refreshed twice and both datasets would consume storage resources in the Power BI service. Additionally, this isolated approach leads to manageability and version control issues, as the datasets may contain variations in transformation or metric logic. Therefore, although the analytical needs of specific business users or teams are indeed the priority of BI projects, it's important to plan for sustainable solutions that can ultimately be shared across teams.

Declaring the grain

All rows of a fact table should represent the individual business process from step 1 at a certain level of detail or grain such as the header level or line level of a purchase order. Therefore, each row should have the same meaning and thus contain values for the same key columns to dimensions and the same numeric columns.

The grain of fact tables ultimately governs the level of detail available for analytical queries as well as the amount of data to be accessed:

  • Determine what each row of the different business processes will represent:
    • For example, each row of the Internet Sales fact table represents the line of a sales order from a customer
    • The rows of the Sales and Margin Plan, however, are aggregated to the level of a Calendar Month, Products Subcategory, and Sales Territory Region
If it's necessary to apply filters or logic to treat certain rows of a fact table differently than others, the fact table likely contains multiple business processes (for example, shipments and orders) that should be split into separate tables. Although it's technically possible to build this logic into DAX measure expressions, well-designed fact tables benefit Power BI and other data projects and tools over the long term. The same metrics and visualizations can be developed via separate fact tables with their own relationships rather than consolidated fact tables.
  • Review and discuss the implications of the chosen grain in terms of dimensionality and scale:
    • Higher granularities provide greater levels of dimensionality and thus detail but result in much larger fact tables
    • If a high grain or the maximum grain is chosen, determine the row counts per year and the storage size of this table once loaded into Power BI datasets
    • If a lower grain is chosen, ensure that project stakeholders understand the loss of dimensionalities, such as the inability to filter for specific products or customers
In general, a higher granularity is recommended for analytical power and sustainability. If a less granular design is chosen, such as the header level of a sales order, and this grain later proves to be insufficient to answer new business questions, then either a new fact table would have to be added to the dataset or the existing fact table and all of its measures and dependent reports would have to be replaced.

Identifying the dimensions

The dimensions to be related to the fact table are a natural byproduct of the grain chosen in step 2 and thus largely impact the decision in step 2. A single sample row from the fact table should clearly indicate the business entities (dimensions) associated with the given process such as the customer who purchased an individual product on a certain date and at a certain time via a specific promotion. Fact tables representing a lower grain will have fewer dimensions. For example, a fact table representing the header level of a purchase order may identify the vendor but not the individual products purchased from the vendor:

  • Identify and communicate the dimensions that can be used to filter (aka slice and dice) each business process:
    • The foreign key columns based on the grain chosen in the previous step reference dimension tables.
    • Review a sample of all critical dimension tables, such as Product or Customer, and ensure these tables contain the columns and values necessary or expected.
  • Communicate which dimensions can be used to filter multiple business processes simultaneously:
    • In this project, the Product, Sales Territory, and Date dimensions can be used to filter all three fact tables.
    • The data warehouse bus matrix referenced earlier can be helpful for this step
  • Look for any gap between the existing dimension tables and business questions or related reports:
    • For example, existing IT-supported reports may contain embedded logic that creates columns via SQL which are not stored in the data warehouse
  • Strive to maintain version control for dimension tables and the columns (attributes) within dimension tables:
    • It may be necessary for project stakeholders to adapt or migrate from legacy reports or an internally maintained source to the Corporate BI source

A significant challenge to the identity of the dimensions step can be a lack of Master Data Management (MDM) and alternative versions. For example, the sales organization may maintain their own dimension tables in Excel or Microsoft Access and their naming conventions and hierarchy structures may represent a conflict or gap with the existing data warehouse. Additionally, many corporate applications may store their own versions of common dimensions, such as products and customers. These issues should be understood and, despite pressure to deliver BI value quickly or according to a specific business team's preferred version, the long-term value of a single definition for an entire organization as expressed via the bus matrix should not be sacrificed.

Defining the facts

The facts represent the numeric columns to be included in the fact table. While the dimension columns from step 3 will be used for relationships to dimension tables, the fact columns will be used in measures containing aggregation logic such as the sum of a quantity column and the average of a price column:

  • Define the business logic for each fact that will be represented by measures in the dataset:
    • For example, gross sales is equal to the extended amount on a sales order, and net sales is equal to gross sales minus discounts
  • Any existing documentation or relevant technical metadata should be reviewed and validated
  • Similar to the dimensions, any conflicts between existing definitions should be addressed so that a single definition for a core set of metrics is understood and approved
  • Additionally, a baseline or target source should be identified to validate the accuracy of the metrics to be created.
    • For example, several months following the project, it should be possible to compare the results of DAX measures from the Power BI dataset to an SSRS report or a SQL query
    • If no variance exists between the two sources, the DAX measures are valid and thus any doubt or reported discrepancy is due to some other factor
See Chapter 2, Connecting to Sources and Transforming Data with M, Chapter 3, Designing Import and DirectQuery Data Models, and Chapter 4, Developing DAX Measures and Security Roles, for details on the fact table columns to include in Power BI datasets (for import or DirectQuery) and the development of DAX metric expressions. The fact definitions from this step relate closely to the concept of base measures described in Chapter 4, Developing DAX Measures and Security Roles. Ultimately, the DAX measures implemented have to tie to the approved definitions, but there are significant data processing, storage and performance implications based on how this logic is computed. In many cases, the Power BI dataset can provide the same logic as an existing system but via an alternative methodology that better aligns with Power BI or the specific project need.
You have been reading a chapter from
Mastering Microsoft Power BI
Published in: Mar 2018
Publisher: Packt
ISBN-13: 9781788297233
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime