Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Expert Data Modeling with Power BI
Expert Data Modeling with Power BI

Expert Data Modeling with Power BI: Get the best out of Power BI by building optimized data models for reporting and business needs

eBook
$9.99 $55.99
Paperback
$69.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Expert Data Modeling with Power BI

Chapter 1: Introduction to Data Modeling in Power BI

Power BI is not just a reporting tool that someone uses to build sophisticated reports; it is a platform supplying a wide range of features from data preparation to data modeling and data visualization. It is also a very well-designed ecosystem, giving a variety of users the ability to contribute to their organization's data analysis journey in many ways, from sharing datasets, reports, and dashboards to using their mobile phones to add some comments to a report, ask questions, and circulate it back to relevant people. All of this is only possible if we take the correct steps in building our Power BI ecosystem. A very eye-catching and beautiful report is worth nothing if it shows incorrect business figures or if the report is too slow to render so the user does not really have the appetite to use it.

One of the most important aspects of building a good Power BI ecosystem is getting the data right. In real-world scenarios, you normally get data from various data sources. Getting data from the data sources and mashing it up is just the beginning. Then you need to come up with a well-designed data model that guarantees you always represent the right figures supporting the business logic so the report performs well.

In this chapter, we'll start by learning about the different Power BI layers and how data flows between the different layers to be able to fix any potential issues more efficiently. Then, we'll study one of the most important aspects of Power BI implementation, that is, data modeling. You'll learn more about data modeling limitations and availabilities under different Power BI licensing plans. Finally, we'll discuss the iterative data modeling approach and its different phases.

In this chapter, we'll cover the following main sections:

  • Power BI Desktop layers
  • What data modeling means in Power BI
  • Power BI licensing considerations for data modeling
  • The iterative data modeling approach

Understanding the Power BI layers

As stated before, Power BI is not just a reporting tool. As the focus of this book is data modeling, we would rather not explain a lot about the tool itself, but there are some concepts that should be pointed out. When we talk about data modeling in Power BI, we are indeed referring to Power BI Desktop as our development tool. You can think of Power BI Desktop like Visual Studio when developing an SQL Server Analysis Services (SSAS) Tabular model. Power BI Desktop is a free tool offering from Microsoft that can be downloaded from https://powerbi.microsoft.com/en-us/downloads/. So, in this book, we're referring to Power BI Desktop when we say Power BI unless stated otherwise.

The following illustration shows a very simple process we normally go through while building a report in Power BI Desktop:

Figure 1.1 – Building a new report process in Power BI

Figure 1.1 – Building a new report process in Power BI

To go through the preceding processes, we use different conceptual layers of Power BI. You can see those layers in Power BI Desktop as follows:

Figure 1.2 – Power BI layers

Figure 1.2 – Power BI layers

Download the Microsoft Contoso Sales sample for Power BI Desktop from https://www.microsoft.com/en-us/download/confirmation.aspx?id=46801.

Let's discuss each point in detail:

  • The Power Query (data preparation) layer
  • The data model layer
  • The data visualization layer

The data preparation layer (Power Query)

In this layer, you get data from various data sources, transform and cleanse that data, and make it available for other layers. This is the very first layer that touches your data, so it is a very important part of your data journey in Power BI. In the Power Query layer, you decide which queries load data into your data model and which ones will take care of data transformation and data cleansing without loading the data into the data model:

Figure 1.3 – Power Query

Figure 1.3 – Power Query

The data model layer

This layer has two views, the Data view and the Model view. In the Data view, you can see the data, and in the Model view, you can see the data models.

The Data view

After we are done with our data preparation in the Power Query layer, we load the data into the data model layer. Using the Data view, we can see the underlying data in our data model layer after it has been transformed in the data preparation layer. Depending on the connection mode, this view may or may not be accessible. While we can see the output of the data preparation, in this view we also take some other actions, such as creating analytical objects such as calculated tables, calculated columns, and measures, or copying data from tables.

Note

All objects we create in DAX are a part of our data model.

The following screenshot shows the Data view in Power BI Desktop when the storage mode of the table is set to Import:

Figure 1.4 – Data view; storage mode: Import

Figure 1.4 – Data view; storage mode: Import

The Data view tab does not show the underlying data if the table only shows the data when the storage mode is set to Import. If the storage mode is set to DirectQuery, the data will not be shown in the Data view:

Figure 1.5 – Data view; storage mode: DirectQuery

Figure 1.5 – Data view; storage mode: DirectQuery

The Model view

As its names implies, the Model view is where we stitch all the pieces together. Not only can we visually see how the tables are related in the model section, but also, we can create new relationships, format fields and synonyms, show/hide fields, and so on:

Figure 1.6 – Model view

Figure 1.6 – Model view

The data visualization layer

In this layer, we bring the data to life by making meaningful and professional-looking data visualizations. This layer is accessible from the Report view, which is the default view in Power BI Desktop.

The Report view

In the Report view, we can build storytelling visualizations to help businesses make data-driven decisions on top of their data. For more convenience, we also create analytical calculations with DAX, such as calculated tables, calculated columns, and measures from the Fields pane in the Report view, but this doesn't mean those calculation objects are a part of the data visualization layer. Indeed, those calculations are a part of the data model layer:

Figure 1.7 – The Report view

Figure 1.7 – The Report view

Download the Sales & Returns sample.pbix file from https://docs.microsoft.com/en-us/power-bi/create-reports/sample-datasets#sales--returns-sample-pbix-file.

How data flows in Power BI

Understanding how data flows during its journey in Power BI is important from a maintenance perspective. For instance, when you see an issue with some calculations in a report, you'll know how to do a root cause analysis and trace the issue back to an actionable point. So, if you find an issue with a figure in a line chart and that line chart is using a measure that is dependent on a calculated column, you quickly know that you won't find that calculated column in Power Query as the objects created in the data model are not accessible in Power Query. So, in that sense, you will never look for a measure in the Power Query layer or vice versa, as you do not expect to be able to use user-defined functions in the data model layer. We will discuss custom functions in Chapter 3, Data Preparation in Power Query Editor, Custom Functions:

Figure 1.8 – The flow of data in Power BI

Figure 1.8 – The flow of data in Power BI

To understand this better, let's go through a scenario.

In a Power BI report, the developer has defined a query parameter. The parameter has a list of capital letters, E, O, and P. There is also a Product query in Power Query holding descriptive information about the product. The Product Name column is filtered by the parameters list. So, when the developer selects E from the parameter, the Product query filters the results showing only the products whose name starts with E.

You put a table visual on the report canvas with the Product Name column. Can you add a slicer to the report canvas showing the parameters' values so that the end user changes the values in the slicer and can see the changes in the table visual?

This is a real-world question you may get from time to time from Power BI developers. To answer the question, you need to think about Power BI layers. Let's do some analysis:

  • Query parameters are defined in the data preparation layer in Power Query.
  • Filtering a query is also a transformation step in Power Query, which changes the result sets of the query. Therefore, when we import the data into the data model, the result sets will not change unless we go back to Power Query and change the parameters' values, which consequently changes the result sets of the Product query and imports the new result sets to the data model.
  • By default, query parameters' values are not loaded into the data model unless the developer sets Enable load. Setting Enable load only loads the selected values from the parameters list and not the whole list.
  • A slicer is a visual. So, now we are talking about the data visualization layer. This means the slicer can only get values available in the data model.

So, the answer is no. After importing the result sets of a query to the data model, that data will be accessible to the data visualization layer.

What data modeling means in Power BI

Data modeling is undoubtedly one of the most important parts of Power BI development. The purpose of data modeling in Power BI is different from data models in transactional systems. In a transactional system, the goal is to have a model that is optimized for recording transactional data. Nevertheless, a well-designed data model in Power BI must be optimized for querying the data and reducing the dataset size by aggregating that data.

While not everyone has the luxury of having a prebuilt data warehouse, you end up creating a data model in Power BI in many cases. It is very tempting to get the whole data as is from various data sources and import it to Power BI. Answering business questions can quickly translate to complex queries that take a long time to process, which is not ideal. So, it is highly advised to resist the temptation to import everything from the data sources into Power BI and solve the problems later. Instead, it is wise to try to get your data model right so that it is capable of precisely answering business-driven questions in the most performant way. When modeling data in Power BI, you need to build a data model based on the business logic. Having said that, you may need to join different tables and aggregate your data to a certain level that answers all business-driven questions. It can get worse if you have data from various data sources of different grains representing the same logic.

Therefore, you need to transform and reshape that data before loading it to the data model in Power BI. Power Query is the right tool to take care of all that. After we cut all the noise from our data, we have a clean, easy-to-understand, and easy-to-work-with data model.

Semantic model

Power BI inherits its blood from Power Pivot and SSAS Tabular models. All of them use the xVelocity engine, an updated version of the VertiPaq engine designed for in-memory data analysis and consisting of semantic model objects such as tables, relationships, hierarchies, and measures, which are stored in memory, leveraging column store indexing. All of this means that you would expect to get tremendous performance gains over highly compressed data, right? Well, it depends. If you efficiently transformed the data to support business logic, then you can expect to have fast and responsive reports. After you import your data into the data model in Power BI, you have built a semantic model when it comes to the concepts. A semantic model is a unified data model that provides business contexts to data. The semantic model can be accessed from various tools to visualize data without needing it to be transformed again. In that sense, when you publish your reports to the Power BI service, you can analyze the dataset in Excel or use third-party tools such as Tableau to connect to a Power BI dataset backed by a Premium capacity and visualize your data.

Building an efficient data model in Power BI

An efficient data model can quickly answer all business questions that you are supposed to answer; it is easy to understand and easy to maintain. Let's analyze the preceding sentence. Your model must do the following:

  • Perform well (quickly)
  • Be business-driven
  • Decrease the level of complexity (be easy to understand)
  • Be maintainable with low costs

Let's look at the preceding points with a scenario.

You are tasked to create a report on top of three separate data sources, as follows:

  • An OData data source with 15 tables. The tables have between 50 and 250 columns.
  • An Excel file with 20 sheets that are interdependent with many formulas.
  • A data warehouse hosted in SQL Server. You need to get data from five dimensions and two fact tables:

    - Of those five dimensions, one is a Date dimension and the other is a Time dimension. The grain of the Time dimension is hour, minute.

    - Each of the fact tables has between 50 and 200 million rows. The grain of both fact tables from a date and time perspective is day, hour, minute.

    - Your organization has a Power BI Pro license.

There are already a lot of important points in the preceding scenario that you must consider before starting to get the data from the data sources. There are also a lot of points that are not clear at this stage. I have pointed out some of them:

  • OData: OData is an online data source, so it could be slow to load the data from the source system.

    The tables are very wide so it can potentially impact the performance.

    Our organization has a Power BI Pro license, so we are limited to 1 GB file size.

    The following questions should be answered by the business. Without knowing the answers, we may end up building a report that is unnecessarily large with poor performance. This can potentially lead to the customer's dissatisfaction:

    (a) Do we really need to import all the columns from those 15 tables?

    (b) Do we also need to import all data or is just a portion of the data enough? In other words, if there is 10-years worth of data kept in the source system, does the business need to analyze all the data, or does just 1 or 2 years of data fit the purpose?

  • Excel: Generally, Excel workbooks with many formulas can be quite hard to maintain. So, we need to ask the following questions of the business:

    (a) How many of those 20 sheets of data are going to be needed by the business? You may be able to exclude some of those sheets.

    (b) How often are the formulas in the Excel file edited? This is a critical point as modifying the formulas can easily break your data processing in Power Query and generate errors. So, you would need to be prepared to replicate a lot of formulas in Power BI if needed.

  • Data warehouse in SQL Server: It is beneficial to have a data warehouse as a source system as data warehouses normally have a much better structure from an analytical viewpoint. But in our scenario, the finest grain of both fact tables is down to a minute. This can potentially turn into an issue pretty quickly. Remember, we have a Power BI Pro license, so we are limited to a 1 GB file size only. Therefore, it is wise to clarify some questions with the business before we start building the model:

    (a) Does the business need to analyze all the metrics down to the minute or is day-level enough?

    (b) Do we need to load all the data into Power BI, or is a portion of the data enough?

    We now know the questions to ask, but what if the business needs to analyze the whole history of the data? In that case, we may consider using composite models with aggregations.

    Having said all that, there is another point to consider. We already have five dimensions in the data warehouse. Those dimensions can potentially be reused in our data model. So, it is wise to look at the other data sources and find commonalities in the data patterns.

You may come up with some more legitimate points and questions. However, you can quickly see in the previously mentioned points and questions that you need to talk to the business and ask questions before starting the job. It is a big mistake to start getting data from the source systems before framing your questions around business processes, requirements, and technology limitations. There are also some other points that you need to think about from a project management perspective that are beyond the scope of this book.

The initial points to take into account for building an efficient data model are as follows:

  • We need to ask questions of the business to avoid any confusions and potential reworks in the future.
  • We need to understand the technology limitations and come up with solutions.
  • We have to have a good understanding of data modeling, so we can look for common data patterns to prevent overlaps.

At this point, you may think, "OK, but how we can get there?" My answer is that you have already taken the first step, which is reading this book. All the preceding points and more are covered in this book. The rest is all about you and how you apply your learning to your day-to-day challenges.

Star schema (dimensional modeling) and snowflaking

First things first, star schema and dimensional modeling are the same things. In Power BI data modeling, the term star schema is more commonly used. So, in this book, we will use the term star schema. The intention of this book is not to teach you about dimensional modeling. Nevertheless, we'll focus on how to model data using star schema data modeling techniques. We will remind you of some star schema concepts.

Transactional modeling versus star schema modeling

In transactional systems, the main goal is to improve the system's performance in creating new records and updating/deleting existing records. So, it is essential to go through the normalization process to decrease data redundancy and increase data entry performance when designing transactional systems. In a straightforward form of normalization, we break tables down into master-detail tables.

But the goal of a business analysis system is very different. In business analysis, we need a data model optimized for querying in the most performant way.

Let's continue with a scenario.

Say we have a transactional retail system for international retail stores. We have hundreds of transactions every second from different parts of the world. The company owner wants to see the total sales amount in the past 6 months.

This calculation sounds easy. It is just a simple SUM of sales. But wait, we have hundreds of transactions every second, right? If we have only 100 transactions per second, then we have 8,640,000 transactions a day. So, for 6 months of data, we have more than 1.5 billion rows. Therefore, a simple SUM of sales will take a reasonable amount of time to process.

The scenario continues: Now, a new query comes from the business. The company owner now wants to see the total sales amount in the past 6 months by country and city. He simply wants to know what the best-selling cities are.

We need to add another condition to our simple SUM calculation, which translates to a join to the geography table. For those coming from a relational database design background, it will be obvious that joins are relatively expensive operations. This scenario can go on and on. So, you can imagine how quickly you will face a severe issue.

In the star schema, however, we already joined all those tables based on business entities. We aggregated and loaded the data into denormalized tables. In the preceding scenario, the business is not interested in seeing every transaction at the second level. We can summarize the data at the day level, which decreases the number of rows from 1.5 billion to a couple of thousands of rows for the whole 6 months. Now you can imagine how fast the summation would be running over thousands of rows instead of 1.5 billion rows.

The idea of the star schema is to keep all numeric values in separate tables called fact tables and put all descriptive information into other tables called dimension tables. Usually, the fact tables are surrounded by dimensions that explain those facts. When you look at a data model with a fact table in the middle surrounded by dimensions, you see that it looks like a star. Therefore, it is called a star schema. In this book, we generally use the Adventure Works DW data, a renowned Microsoft sample dataset, unless stated otherwise. Adventure Works is an international bike shop that sells products both online and in their retail shops. The following figure shows Internet Sales in a star schema shape:

Figure 1.9 – Adventure Works DW, Internet Sales star schema

Figure 1.9 – Adventure Works DW, Internet Sales star schema

Snowflaking

Snowflaking is when you do not have a perfect star schema when dimension tables surround your fact tables. In some cases, you have some levels of descriptions stored in different tables. Therefore, some dimensions of your model are linked to some other tables that describe the dimensions in a greater level of detail. Snowflaking is normalizing the dimension tables. In some cases, snowflaking is inevitable; nevertheless, the general rule of thumb in data modeling in Power BI (when following a star schema) is to avoid snowflaking as much as possible. The following figure shows snowflaking in Adventure Works Internet Sales:

Figure 1.10 – Adventure Works, Internet Sales snowflakes

Figure 1.10 – Adventure Works, Internet Sales snowflakes

Understanding denormalization

In real-world scenarios, not everyone has the luxury of having a pre-built data warehouse designed in a star schema. In reality, snowflaking in data warehouse design is inevitable. So, you may build your data model on top of various data sources, including transactional database systems and non-transactional data sources such as Excel files and CSV files. So, you almost always need to denormalize your model to a certain degree. Depending on the business requirements, you may end up having some level of normalization along with some denormalization. The reality is that there is no specific rule for the level of normalization and denormalization. The general rule of thumb is to denormalize your model so that a dimension can describe all the details as much as possible.

In the preceding example from Adventure Works DW, we have snowflakes of Product Category and Product Subcategory that can be simply denormalized into the Product dimension.

Let's go through a hands-on exercise.

Go through the following steps to denormalize Product Category and Product Subcategory into the Product dimension.

Note

You can download the Adventure Works, Internet Sales.pbix file from here:

https://github.com/PacktPublishing/Expert-Data-Modeling-with-Power-BI/blob/master/Adventure%20Works%20DW.pbix

Open the Adventure Works, Internet Sales.pbix file and follow these steps:

  1. Click Transform data on the Home tab in the Queries section.
  2. Click the Product Query.
  3. Click Merge Queries on the Home tab in the Combine section.
  4. Select Product Subcategory from the drop-down list.
  5. Click ProductSubcategoryKey on the Product table.
  6. Click ProductSubcategoryKey on the Product Subcategory table.
  7. Select Left Outer (all from first matching from the second) from the Join Kind dropdown.
  8. Click OK:

Figure 1.11 – Merging Product and Product Subcategory

Figure 1.11 – Merging Product and Product Subcategory

This adds a new step named Merged Queries. As you see, the values of this column are all Table. This type of column is called Structured Column. The merging Product and Product Subcategory step creates a new structured column named Product Subcategory:

You will learn more about structured columns in Chapter 3, Data Preparation in Power Query Editor.

Figure 1.12 – Merging the Product and Product Subcategory tables

Figure 1.12 – Merging the Product and Product Subcategory tables

Now let's look at how to expand a structured column in the query editor:

  1. Click the Expand button to expand the Product Subcategory column.
  2. Select ProductCategoryKey.
  3. Select the EnglishProductSubcategoryName columns and unselect the rest.
  4. Unselect Use original column names as prefix.
  5. Click OK:

Figure 1.13 – Expanding Structured Column in the Query Editor

Figure 1.13 – Expanding Structured Column in the Query Editor

So far, we have added the EnglishProductSubcategoryName and ProductCategoryKey columns from the Product Subcategory query to the Product query. The next step is to add EnglishProductCategoryName from the Product Category query. To do so, we need to merge the Product query with Product Category:

  1. Click Merge Queries again.
  2. Select Product Category from the drop-down list.
  3. Select ProductCategoryKey from the Product table.
  4. Select ProductCategoryKey from the Product Category table.
  5. Select Left Outer (all from first matching from second).
  6. Click OK:

Figure 1.14 – Merging Product and Product Category

Figure 1.14 – Merging Product and Product Category

This adds another step and a new structured column named Product Category. We now need to do the following:

  1. Expand the new column.
  2. Pick EnglishProductCategoryName from the list.
  3. Unselect Use original column name as prefix.
  4. Click OK:
Figure 1.15 – Merging Product and Product Category

Figure 1.15 – Merging Product and Product Category

The next step is to remove the ProductCategoryKey column as we do not need it anymore. To do so, do the following:

  1. Click on the ProductCategoryKey column.
  2. Click the Remove Columns button in the Managed Column section of the Home tab:
Figure 1.16 – Removing a column in the Query Editor

Figure 1.16 – Removing a column in the Query Editor

Now we have merged the Product Category and Product Subcategory snowflakes with the Product query. So, you have denormalized the snowflakes.

The very last step is to unload both the Product Category and Product Subcategory queries:

  1. Right-click on each query.
  2. Untick Enable load from the menu.
  3. Click Continue on the Possible Data Loss Warning pop-up message:
Figure 1.17 – Unloading queries in the Query Editor

Figure 1.17 – Unloading queries in the Query Editor

Now we need to import the data into the data model by clicking Close & Apply:

Figure 1.18 – Importing data into the data model

Figure 1.18 – Importing data into the data model

We have now achieved what we were after: we denormalized the Product Category and Product Subcategory tables, therefore rather than loading those two tables, we now have EnglishProductCategoryName and EnglishProductSubcategoryName represented as new columns in the Product table.

Job done!

Power BI licensing considerations

At this point, you may be wondering how Power BI licensing affects data modeling. It does, as each licensing tier comes with a set of features that can potentially affect the data modeling. Nevertheless, regardless of the licensing tier you are using, Power BI Desktop is free of charge. In this section, we'll quickly look at some licensing considerations related to data modeling.

The following table is a simplified version of the Power BI feature comparisons published on the Microsoft website separately based on different licenses:

Figure 1.19 – A simplified version of Power BI feature comparisons

Figure 1.19 – A simplified version of Power BI feature comparisons

Maximum size of individual dataset

As the table illustrates, we are limited to 1 GB for each dataset published to the Power BI service under Free or Professional licensing. Therefore, managing the file size is quite important. There are several ways to keep the file size just below the limit, as follows:

  • Import the necessary columns only.
  • Import just a portion of data when possible. Explain the technology limitation to the business and ask whether you can filter out some data. For instance, the business may not need to analyze 10 years of data, so filter older data in Power Query.
  • Use aggregations. In many cases, you may have the data stored in the source at a very low granularity. However, the business requires data analysis on a higher grain. Therefore, you can aggregate the data to a higher granularity, then import it into the data model. For instance, you may have data stored at a minute level. At the same time, the business only needs to analyze that data at the day level.
  • Consider disabling auto date/time settings in Power BI Desktop.
  • Consider optimizing data types.

We will cover all the preceding points in the upcoming chapters.

Incremental data load

One of the coolest features available in Power BI is the ability to set up an incremental data load. Incremental data loading in Power BI is inherited from SSAS to work with large models. When it is set up correctly, Power BI does not truncate the dataset and re-import all the data from scratch. Instead, it only imports the data that has been changed since the last data refresh. Therefore, incremental data load can significantly improve the data refresh performance and decrease the amount of processing load on your tenant. Incremental data load is available in both Professional and Premium licenses.

Calculation groups

Calculation groups are like calculated members in MultiDimensional eXpressions (MDX). Calculation groups were initially introduced in SSAS 2019 Tabular models. They are also available in Azure Analysis Services and all Power BI licensing tiers.

It is a common scenario that you create (or already have) some base measures in your Power BI model and then create many time intelligence measures on top of those base measures. In our sample file, we have three measures, as follows:

  • Product cost: SUM('Internet Sales'[TotalProductCost])
  • Order quantity: SUM('Internet Sales'[OrderQuantity])
  • Internet sales: SUM('Internet Sales'[SalesAmount])

The business requires the following time intelligence calculations on top of all the preceding measures:

  • Year to date
  • Quarter to date
  • Month to date
  • Last year to date
  • Last quarter to date
  • Last month to date
  • Year over year
  • Quarter over quarter
  • Month over month

We have nine calculations to be built on top of every single measure we have in our model. Hence, we end up having 9 x 3 = 27 measures to build in our model. You can imagine how quickly the number of measures can rise in the model, so you should not be surprised if someone tells you that they have hundreds of measures in their Power BI model.

Another common scenario is when we have multiple currencies. Without calculation groups, you need to convert the values into strings to show the figures and use a relevant currency symbol using the FORMAT() function in DAX. Now, if you think about the latter point, combined with time intelligence functions, you can see how the issue can get bigger and bigger.

Calculation groups solve those sorts of problems. We cover calculation groups in Chapter 10, Advanced Data Modeling Techniques.

Shared datasets

As the name implies, a shared dataset is a dataset used across various reports in a modern workspace (a new workspace experience) within the Power BI service. Therefore, it is only available in the Power BI Professional and Power BI Premium licensing plans. This feature is quite crucial to data modelers. It provides more flexibility in creating a more generic dataset, covering more business entities in a single dataset instead of having several datasets that may share many commonalities.

Power BI Dataflows

Dataflows, also referred to as Power Query Online, provide a centralized data preparation mechanism in the Power BI service that other people across the organization can take advantage of. Like using Power Query in Power BI Desktop for data preparation, we can prepare, clean, and transform the data in dataflows. Unlike Power Query queries, which are isolated within a dataset, when created in Power BI Desktop and then published to the Power BI service, you can share all data preparations, data cleansing, and data transformation processes across the organization with dataflows.

You can create Power BI dataflows inside a workspace, so it is only available to Professional and Premium users. We will also cover Power BI dataflows in future chapters.

The iterative data modeling approach

Like many other software development approaches, data modeling is an ongoing process. You start talking to the business, then apply the business logic to your model. You carry on with the rest of your Power BI development. In many cases, you build your data visualizations and then find out that you will get better results if you make some changes in your model. In many other cases, the business logic applied to the model is not what the business needs. This is a typical comment that many of us will get from the business after the first few iterations:

This looks really nice, but unfortunately, it is not what we want.

So, taking advantage of an agile approach would be genuinely beneficial for Power BI development. Here is the iterative approach you can follow in your Power BI development:

Figure 1.20 – The iterative data modeling approach

Figure 1.20 – The iterative data modeling approach

Information gathering from the business

Like all other software development processes, a Power BI development process starts with gathering information from the business to get a better understanding of the business requirements. A business analyst may take care of this step in the real world but wait, a lot of Power BI users are business analysts. Regardless of your role, whether you are a business analyst or a business analyst takes care of this step and you are a data modeler, you need to analyze the information you get from the business. You have to ask relevant questions and come up with a list of design possibilities. You have to identify potential risks and discuss them with the customer. You also need to be aware of technology limitations and discuss them with the customer as well. After you get answers to your questions and have a list of design possibilities, risks, and technology limitations, you can move on to the next step more confidently.

Data preparation based on the business logic

You now have a lot on your plate. You need to get the data from various data sources and go through the data preparation steps. Now that you know a lot about business logic, you can take the proper steps in your data preparation. For instance, if the business requires you to connect to an OData data source and get a list of the columns required by the business, you can prepare your data more efficiently with all the design risks and technology limitations in mind. After you have consciously prepared your data, you will go on to the next step, which is data modeling.

Data modeling

If you took the proper actions in the previous steps, your data model will be much tidier, so you can build your model more efficiently. Now you need to think about the analytical side of things. Simultaneously, you still have all the business requirements, design possibilities, risks, and technology limitations in mind. For instance, if the business cannot tolerate data latency longer than 5 minutes, you may need to think about using DirectQuery. Using DirectQuery comes with some limitations and performance risks. So, you need to think about the design approach that satisfies the business requirements the most. We cover DirectQuery in Chapter 4, Getting Data from Various Sources in the Dataset storage modes section.

Testing the logic

This is one of the most trivial and yet most important steps in data modeling: testing all the business logic you implement to meet the requirements. Not only do you need to test the figures to make sure the results are accurate, but you also need to test the solution from a performance and user experience perspective. Be prepared for tons of mixed feedback, and sometimes strong criticism from the end users, especially when you think everything is OK.

Demonstrating the business logic in a basic data visualization

As we are modeling the data, we do not need to be worried about the data visualization part. The fastest way to make sure all the business logic is right is to confirm with the business. The fastest way to do that is to demonstrate the logic in the simplest possible way, such as using table and matrix visuals and some slicers on the page. Remember, this is only to confirm the logic with the business, not the actual product delivery. There will be a lot of new information and surprises that come up during the demonstration in the real world, which means you'll then need to start the second iteration and gather more information from the business.

As you go through all the preceding steps several times, you'll gradually become a professional data modeler. In the next section, we'll quickly cover how professional data modelers think.

Note

This book also follows an iterative approach, so we'll go back and forth between different chapters to cover some scenarios.

Thinking like a professional data modeler

Back in the day, in the late 90s, I was working on transactional database systems. Back then, it was essential to know how to normalize your data model to at least the third normal form. In some cases, we were normalizing to the Boyce-Codd normal form. I carried out many projects facing a lot of different issues and I made many mistakes, but I learned from those mistakes. Gradually, I was experienced enough to visualize the data model to the second or sometimes even to the third normal form in my head while I was in a requirements gathering session with the customer. All data modeling approaches that I had a chance to work with, or read about, were based on relational models regardless of their usage, such as transactional models, star schema, Inmon, and data vault. They are all based on relational data modeling. Data modeling in Power BI is no different. Professional data modelers can visualize the data model in their minds from the first information-gathering sessions they have with the customer. But as mentioned, this capability comes with experience.

Once you have enough experience in data modeling, you'll be able to ask more relevant questions from the business. You already know of some common scenarios and pitfalls, so you can quickly recognize other similar situations. Therefore, you can avoid many future changes by asking more relevant questions. Moreover, you can also give your customer some new ideas to solve other problems down the road. In many cases, the customer's requirements will change during the project lifetime. So, you will not be surprised when those changes happen.

Summary

In this chapter, we discussed the different layers of Power BI and what is accessible in which layer. Therefore, when we face an issue, we know exactly where we should look to fix the problem. Then we discussed how when we build a data model, we are indeed making a semantic layer in Power BI. We also covered some star schema and snowflaking concepts, which are essential to model our data more efficiently. We then covered different Power BI licensing considerations and how they can potentially affect our data modeling. Lastly, we looked at the data modeling iterative approach to deliver a more precise and more reliable data model that solves many problems that the report writers may face down the road.

In the next chapter, we will look at DAX and data modeling. We will discuss a somewhat confusing topic, virtual tables, and we will walk you through some common time intelligence scenarios to help you with your future data modeling tasks.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Understand data modeling techniques to get the best out of data using Power BI
  • Define the relationships between data to extract valuable insights
  • Solve a wide variety of business challenges by building optimal data models

Description

This book is a comprehensive guide to understanding the ins and outs of data modeling and how to create data models using Power BI confidently. You'll learn how to connect data from multiple sources, understand data, define and manage relationships between data, and shape data models to gain deep and detailed insights about your organization. In this book, you'll explore how to use data modeling and navigation techniques to define relationships and create a data model before defining new metrics and performing custom calculations using modeling features. As you advance through the chapters, the book will demonstrate how to create full-fledged data models, enabling you to create efficient data models and simpler DAX code with new data modeling features. With the help of examples, you'll discover how you can solve business challenges by building optimal data models and changing your existing data models to meet evolving business requirements. Finally, you'll learn how to use some new and advanced modeling features to enhance your data models to carry out a wide variety of complex tasks. By the end of this Power BI book, you'll have gained the skills you need to structure data coming from multiple sources in different ways to create optimized data models that support reporting and data analytics.

Who is this book for?

This MS Power BI book is for BI users, data analysts, and analysis developers who want to become well-versed with data modeling techniques to make the most of Power BI. You’ll need a solid grasp on basic use cases and functionalities of Power BI and Star Schema functionality before you can dive in.

What you will learn

  • Implement virtual tables and time intelligence functionalities in DAX to build a powerful model
  • Identify Dimension and Fact tables and implement them in Power Query Editor
  • Deal with advanced data preparation scenarios while building Star Schema
  • Explore best practices for data preparation and modeling
  • Discover different hierarchies and their common pitfalls
  • Understand complex data models and how to decrease the level of model complexity with different approaches
  • Learn advanced data modeling techniques such as aggregations, incremental refresh, and RLS/OLS
Estimated delivery fee Deliver to Thailand

Standard delivery 10 - 13 business days

$8.95

Premium delivery 5 - 8 business days

$45.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jun 11, 2021
Length: 612 pages
Edition : 1st
Language : English
ISBN-13 : 9781800205697
Vendor :
Microsoft
Category :
Languages :
Concepts :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Thailand

Standard delivery 10 - 13 business days

$8.95

Premium delivery 5 - 8 business days

$45.95
(Includes tracking information)

Product Details

Publication date : Jun 11, 2021
Length: 612 pages
Edition : 1st
Language : English
ISBN-13 : 9781800205697
Vendor :
Microsoft
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 194.97
Microsoft Power BI Cookbook
$69.99
Extending Power BI with Python and R
$54.99
Expert Data Modeling with Power BI
$69.99
Total $ 194.97 Stars icon
Banner background image

Table of Contents

17 Chapters
Section 1: Data Modeling in Power BI Chevron down icon Chevron up icon
Chapter 1: Introduction to Data Modeling in Power BI Chevron down icon Chevron up icon
Chapter 2: Data Analysis eXpressions and Data Modeling Chevron down icon Chevron up icon
Section 2: Data Preparation in Query Editor Chevron down icon Chevron up icon
Chapter 3: Data Preparation in Power Query Editor Chevron down icon Chevron up icon
Chapter 4: Getting Data from Various Sources Chevron down icon Chevron up icon
Chapter 5: Common Data Preparation Steps Chevron down icon Chevron up icon
Chapter 6: Star Schema Preparation in Power Query Editor Chevron down icon Chevron up icon
Chapter 7: Data Preparation Common Best Practices Chevron down icon Chevron up icon
Section 3: Data Modeling Chevron down icon Chevron up icon
Chapter 8: Data Modeling Components Chevron down icon Chevron up icon
Chapter 9: Star Schema and Data Modeling Common Best Practices Chevron down icon Chevron up icon
Section 4: Advanced Data Modeling Chevron down icon Chevron up icon
Chapter 10: Advanced Data Modeling Techniques Chevron down icon Chevron up icon
Chapter 11: Row-Level Security Chevron down icon Chevron up icon
Chapter 12: Extra Options and Features Available for Data Modeling Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(8 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Salman Jun 16, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book provides excellent meaningful and understandable language for any Power BI user. It is a great book if have a background and are interested in learning how to do things in Power BI. The author does an incredible job of outlining the issues you need to take into consideration but there's very little on the nuts and bolts of actually building Power BI apps. The book provides excellent in-depth knowledge of Power BI and different aspects of creating data model.
Amazon Verified review Amazon
Rajesh Sep 27, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Covers all the necessary concepts of Power BI and a must read for anyone trying to learn the tool!
Amazon Verified review Amazon
Behailu Jun 16, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is an excellent guide to understanding data modeling and how to create data models using Power BI. The book covers how to connect data from multiple sources, understand data, define, and manage the relationships between data, and shape data models. The book also covers a range of aspects of data modeling such as star schema, RLS, and composite models.
Amazon Verified review Amazon
Amit Chandak Jul 22, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Any BI solutions requires a good data modelling. And if that is not taken care the it will become an endless loop for optimization. A bad model can provide you quick results but not extendable solution. This book is a sincere attempt to address this. As power bi enthusiast, you must learn the data model as that is key to success and this book is about how to it best for Power BI.
Amazon Verified review Amazon
Keerthi Oct 05, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is fantastic. Explanations are thorough and clear. Anyone who wants to master Power BI needs to read this.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela