Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
The Definitive Guide to Power Query (M)

You're reading from   The Definitive Guide to Power Query (M) Mastering complex data transformation with Power Query

Arrow left icon
Product type Paperback
Published in Mar 2024
Publisher Packt
ISBN-13 9781835089729
Length 758 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (3):
Arrow left icon
Greg Deckler Greg Deckler
Author Profile Icon Greg Deckler
Greg Deckler
Melissa de Korte Melissa de Korte
Author Profile Icon Melissa de Korte
Melissa de Korte
Rick de Groot Rick de Groot
Author Profile Icon Rick de Groot
Rick de Groot
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Introducing M 2. Working with Power Query/M FREE CHAPTER 3. Accessing and Combining Data 4. Understanding Values and Expressions 5. Understanding Data Types 6. Structured Values 7. Conceptualizing M 8. Working with Nested Structures 9. Parameters and Custom Functions 10. Dealing with Dates, Times, and Durations 11. Comparers, Replacers, Combiners, and Splitters 12. Handling Errors and Debugging 13. Iteration and Recursion 14. Troublesome Data Patterns 15. Optimizing Performance 16. Enabling Extensions 17. Other Books You May Enjoy
18. Index

The formula firewall

When working with queries, combining different data sources is common. It’s likely that after some fiddling you run into an error that prevents you from combining your data. This error is the result of the formula firewall that Power Query has, also known as the data privacy firewall. So, what exactly is the formula firewall, and how can you work with it?

What is the formula firewall?

The formula firewall is a Power Query feature that prevents accidental data transfer between sources. This feature is particularly important when dealing with sensitive information. It was designed because of query folding. Query folding allows Power Query to convert data transformations into a data source query that a data source can execute directly.

Imagine a situation where you’re working with two queries in Power Query. Query1 holds sensitive information, like social security numbers. Using an inner join, you aim to use this data to extract relevant details from Query2. However, for efficiency, you want to avoid importing the entire SQL table associated with Query2 into Power Query. Importing the whole table only to filter out the unnecessary rows later is not optimal. After all, using query folding is a more efficient approach, as it only imports the required data.

However, while query folding is effective, it also introduces a potential risk. In the process of query folding, sensitive information from Query1, such as social security numbers, could unintentionally be embedded into the data source query for Query2. If someone with database expertise—or anyone monitoring your network—were to intercept this query, they could potentially access the confidential information embedded within it.

This is where the formula firewall comes into play. Its purpose is to prevent such situations where sensitive data might be unintentionally embedded in queries sent to external sources. To recognize where a potential risk lies, the formula firewall uses partitions.

Understanding partitions

When evaluating a query with the formula firewall active, Power Query splits the query and its dependencies into partitions. These partitions are essentially groups of one or more steps. Whenever a partition references another, the formula firewall intervenes by substituting the reference with a call to a specific function, named Value.Firewall. This function ensures that partitions do not access each other directly. Instead, all references must first be evaluated by the firewall. Data is allowed into the current partition only after the firewall authorizes the interaction between partitions.

As you can imagine, the way partitions are created is incredibly important. Including fewer or more steps within a single partition could determine whether the firewall becomes involved. However, the process of partitioning queries is complex and beyond the scope of this book. For those interested in a deeper understanding of this process, further reading is available at https://learn.microsoft.com/en-us/power-query/data-privacy-firewall. We will refer to this page throughout the rest of this chapter.

The fundamental principle of the formula firewall

The formula firewall operates on a fundamental principle concerning partitions. This principle is split into two key points:

  • Referring to other partitions: A partition can use data or results from another partition. That partition can be another query or a partition within the same query.
  • Accessing compatible data sources: A partition can obtain data from external sources, provided they have a compatible privacy level.

Either of the two is possible, but not both at the same time. Violating this rule will result in one of two formula firewall messages:

  • Error from referencing:
    Formula.Firewall: Query 'X' (step 'Y') references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.
    
  • Error from accessing incompatible data sources:
    Formula.Firewall: Query 'X' (step 'Y') is accessing data sources with incompatible privacy levels. Please rebuild this data combination.
    

From just reading the above statements, it’s hard to understand what’s happening for a Formula Firewall error, but hold on. We’re about to make this a lot clearer. In the next section, we will explore the reasons behind these errors and discuss strategies to resolve them.

Firewall error: Referencing other partitions

The first firewall error we’ll discuss is the error triggered when referencing other partitions. The error message for this error is:

Formula.Firewall: Query 'X' (step 'Y') references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.

It’s important to grasp why this error pops up because it blocks your queries from producing any results. This means you won’t be able to update your data or view the outcomes of your transformations. Unfortunately, solving the Formula Firewall error without knowing its underlying mechanics can be difficult.

Consider the example shown in the following image:

A screenshot of a computer

Description automatically generated

Figure 15.29: Two similar queries where one runs into the Formula Firewall error

Here, you see two similar queries, but only one encounters the Formula Firewall error. In Query1, there’s a reference to a parameter named MyValue1, created through the user interface. The partitioning process automatically removes partitions from such native parameters, allowing Query1 to run smoothly.

However, if you manually set up a parameter that pulls a value from an Excel file, as seen with MyValue2, the partitioning process treats this differently. It doesn’t remove the partition, so any reference to MyValue2 is seen as referencing a partition.

Consequently, when Query2 tries to use MyValue2 with the Web.Contents function to reach another data source while at the same time referencing another partition, it triggers the formula firewall.

Let’s work through an example to see this error in action. To follow along with the below examples, you can download the exercise files from the GitHub repository of this book.

Connecting to a URL using native parameters

A common operation to retrieve data is to connect to a URL. Examples of this could be to retrieve movie scores from IMDb, get data from a table on Wikipedia, or return the latest weather forecast information. Regardless of what information you need, you can connect to a website using Power Query’s web connector. Doing so makes for an easy use case to illustrate the formula firewall. Here’s how.

Suppose you want to connect to the PowerQuery.How website and retrieve the HTML code. First, you’ll create a parameter in Power Query:

  1. Go to the Home tab and select Manage Parameters.
  2. Then choose New and name this parameter myURLParameter.
  3. Set it to type text, and assign the value https://powerquery.how/.

Performing these steps gives you:

A close-up of a computer screen

Description automatically generated

Figure 15.30: A parameter containing a URL to connect to

This newly created parameter, including a static URL, does not directly access data sources.

Next, let’s set up a query to connect to the website and fetch the HTML code. In the Queries area, right-click, select New Query, then click Web. In the dialog box, choose the URL type, select Parameter from the dropdown, and refer to myURLParameter:

A screenshot of a computer

Description automatically generated

Figure 15.31: Creating a web query that uses a parameter as a URL

You’ll be prompted to choose a connection method when connecting to a web source for the first time. Select Anonymous and then Connect. In the following screen, tick the box in front of HTML Code to return the HTML content. Confirm by clicking OK:

A screenshot of a computer

Description automatically generated

Figure 15.32: Selecting web content access and data retrieval options

By completing these steps, you retrieve the HTML content from the website using a hardcoded parameter, resulting in two queries.

The first query is the parameter myURLParameter, defined as:

"https://powerquery.how/" meta [IsParameterQuery=true, Type="Text", IsParameterQueryRequired=true]

The second query is called HTML Code and is defined as:

Web.BrowserContents(myURLParameter)

Up to this stage, the formula firewall has not flagged any problems and has successfully returned the HTML code. This smooth operation is due to the partitioning process, as detailed on Microsoft’s website. This process removes partitions from native parameters established via the Power Query user interface.

In our case, the HTML Code query refers to myURLParameter. However, since this parameter is not included in a partition and does not link to any data sources, the formula firewall does not trigger any alerts. Let’s now move on to a situation where a slight change in the setup leads to the Formula Firewall error.

Connecting to a URL using an Excel parameter

Now, let’s modify our approach slightly. Instead of a native parameter, we’ll retrieve the URL to connect to from a table value from an Excel file. A setup like this allows for end users to make changes to a URL in an easy-to-access Excel file, without the need to fiddle with Power Query. This provides a user-friendly way to enter input, at the risk of running into the formula firewall. Let’s find out how this works.

To follow along, make sure you have downloaded the ExcelParameterFile file from the book’s GitHub repository. It contains a table titled ParameterTable, which has a single column called URL, holding the text value https://powerquery.how/. This looks as follows:

A screenshot of a computer

Description automatically generated

Figure 15.33: An Excel file containing the URL for our query

First, save this file in your preferred location. To connect to it, go to New Query, choose Excel Workbook, and find the file. When you open it, a window will pop up. Here, select ParameterTable and click OK:

A screenshot of a computer

Description automatically generated

Figure 15.34: Selecting data from the Excel Workbook Navigator

After importing the data, you will find a table with a single cell that holds the URL. To use this URL in your queries, you should extract the value from this cell. To do this, right-click on the cell and choose the Drill Down option. This step will return the cell value for you:

A screenshot of a computer

Description automatically generated

Figure 15.35: Drill down on a table value to return its value

The query generated to extract the value from the Excel file looks like this:

let
  Source = Excel.Workbook(File.Contents(
    "C:\Data\ExcelParameterFile.xlsx"), null, true),
  ParameterTable_Table =
    Source{[Item="ParameterTable",Kind="Table"]}[Data],
  URL = ParameterTable_Table{0}[URL]
in
  URL

Rename this query myExcelURL. Do you recall the earlier situation where we used myURLParameter? What if we replace that with myExcelURL, the parameter from our Excel file? Try using this in a query:

Web.BrowserContents( myExcelURL )

After making this change, you’ll encounter an error that looks like the following:

A yellow box with black text

Description automatically generated

Figure 15.36: The Formula Firewall error when referencing queries connecting to a data source

Why does this error arise, especially when we successfully used a parameter in the previous example? To understand this, let’s focus on the fundamental principle of the formula firewall: “A partition can either reference other partitions or access data from sources that have a compatible privacy level. It cannot do both at the same time.”

In our initial example, there were two reasons for the absence of an error. Firstly, the myURLParameter was a native, hardcoded parameter that did not access any external data sources. Secondly, the partitioning process excluded the native parameter from its scope, meaning its reference was not considered as a reference to another partition.

The situation changes with a parameter derived from an Excel file. Such a parameter accesses data from another source (the Excel file), and thus, it’s included in the partitioning process. When the Web.BrowserContents function attempts to access a website (PowerQuery.How) and simultaneously references a partition (the Excel parameter), it breaches the firewall’s fundamental rule.

Partitioning in Power Query can be pretty complicated. Even reading the Microsoft documentation on it will leave you with a lot of remaining questions. However, the main point to remember is this: if you reference one query to get data from a source, like our Excel file, and then try to use that data in another query to access a source, it breaks the rules of the formula firewall. So, what’s the best way to fix this issue?

Resolving the firewall error

When we attempt to reference the Excel parameter from this different partition, we breach a key firewall rule. We cannot access a data source (in this case, the web request) and simultaneously reference another partition (the Excel parameter). Yet, there are plenty of scenarios where you want to be able to use a data point that is stored elsewhere, in this case, the Excel URL. Our objective is to enable the query to retrieve the URL from the external source while finding a way to keep the logic within the same partition. To achieve this, we will explore two possible methods.

Method 1: Using a function

The first method involves storing the myExcelURL parameter logic within a function. A function stores logic into reusable code but does not execute it. When we then call this function in a query, the accessing of data happens within the partition where the function is called. This is different from referencing a query because, in that situation, the accessing of the data occurs in the other query, which will be seen as a separate partition and potentially triggers the firewall.

To create this function, you can add a simple function definition that refers to the previous step at the end of your query:

let
  Source = Excel.Workbook(File.Contents(
      "C:\Data\ExcelParameterFile.xlsx"), null, true),
  ParameterTable_Table =
    Source{[Item="ParameterTable",Kind="Table"]}[Data],
  URL = ParameterTable_Table{0}[URL],
  myFunction = () => URL
in
  myFunction

This script creates a function. Make sure to name it fxMyExcelURL. Use this function in another query to make a web request, like this:

 Web.BrowserContents( fxMyExcelURL() )

Doing this allows you to make your request successfully without running into the formula firewall. You can find this solution in the exercise files of this chapter.

Method 2: Integrating the logic into one query

A second approach is integrating all the logic of accessing the Excel file into the same query. Doing so ensures that all related steps fall within a single partition, which allows using the parameter to access an external data source under the formula firewall’s rules.

To consolidate the steps, you can take the logic of the Excel Workbook query and add a new step at the end of the query. In that step, you can reference the URL from the Excel workbook as follows:

let
  Source = Excel.Workbook(File.Contents(
    "C:\Data\ExcelParameterFile.xlsx"), null, true),
  ParameterTable_Table =
    Source{[Item="ParameterTable",Kind="Table"]}[Data],
  myContainedExcelURL = ParameterTable_Table{0}[URL],
  HTMLCode = Web.BrowserContents( myContainedExcelURL )
in
    HTMLCode

Combining these steps into a single query allows the partitioning process to keep the steps that access data within the same partition. This approach successfully resolves the first Formula Firewall error. To see this for yourself, you can find the complete query in this chapter’s exercise files.

By applying the above steps, you end up in the following situation:

Figure 15.37: Prompt for setting privacy levels

The error informs you that it requires additional information about data privacy. If you press Continue and select Public, your query will return its results. However, by choosing Private or Organizational, your query may lead to the second type of Formula Firewall error, which we will discuss next.

Firewall error: Accessing compatible data sources

The second firewall error can happen when you combine data sources with incompatible privacy levels. Remember how the firewall is meant to prevent unintended data leakage? That’s what this error is about. The message for this error is:

Formula.Firewall: Query 'X' (step 'Y') is accessing data sources with incompatible privacy levels. Please rebuild this data combination.

Consider the scenarios depicted in the following image:

A screenshot of a computer

Description automatically generated

Figure 15.38: Two queries where one contains data sources that are incompatible

The image shows two queries. Focusing on Query1, it accesses data from two external sources. If the Web.Contents step tried to reference another partition, it would normally trigger a “referencing other partitions” error due to simultaneously accessing a data source and referencing another partition. However, that’s not the issue here as the data is within the same query. That allows the partitioning process to put both steps within the same partition. The real problem for Query1 is a mismatch in privacy levels: the data source MyExcelURL1 is marked as ‘Private, while the Web.Contents step is set to the Organizational privacy level. Because these levels don’t match, Power Query reports the second firewall error for accessing incompatible data sources.

Query2 doesn’t encounter this error because both steps in the query involve data sources with matching privacy levels. Therefore, the formula firewall allows both data sources to combine.

Understanding this might seem complex, but don’t worry. We’ll break down this example further with a practical case. To grasp the error message fully, it’s essential to first learn about privacy levels.

Understanding privacy levels

In Power Query, each data source can be assigned a privacy level. These levels are important for determining how data from different sources can be combined.

Here’s an overview of the privacy levels and their effects on query folding:

  • Public: Data from a Public source can be freely combined with other data sources.
  • Organizational: This setting allows data to be combined only with other Organizational sources.
  • Private: The most restrictive setting. Private data does not combine with other sources, maintaining strict isolation.

Think of privacy levels like this: if you label a data source as Private, it won’t mix with other data in queries. This keeps sensitive information safe and separate. Data marked as Organizational can only be mixed with other Organizational data. Public data offers the most flexibility as it can easily be combined with any other data source that permits it. This typically includes data sources with their privacy levels set to Public or None.

Setting privacy levels

When you attempt to combine a data source without a set privacy level, Power Query prompts you to choose one. To set privacy levels manually:

  1. In Power BI Desktop:
    • Navigate to the Home tab on the Power Query ribbon and select Data Source Settings.
    • Choose the data source to modify and click Edit Permissions to set its privacy level.
A screenshot of a computer

Description automatically generated

Figure 15.39: Setting privacy levels in Data source settings

  1. In the Power BI service:
    • Access settings via the top-right gear icon, then select Manage Connections and Gateways.
    • Find your data source and set the Privacy Level at the bottom of the screen.

So, what does this have to do with the formula firewall?

Resolving the firewall error

Let’s revisit our previous example where we retrieved a URL from an Excel file. This action prompted an error message requesting privacy-level information for the data sources:

Figure 15.40: Error message requesting information on privacy levels

The above message requires information about the data privacy level of the web request. To resolve this error, we perform one of two methods.

Method 1: Setting compatible privacy levels

Whether or not we receive the Formula Firewall error depends on the privacy level settings:

  • Public: Your query will execute immediately unless the Excel file we connect to has its privacy level set to Private or Organizational, which causes a firewall error. The privacy levels None and Public are both compatible.
  • Organizational: Setting the query to Organizational prompts a check of the Excel file’s privacy level. The query runs smoothly if the file we combine with is set to Organizational. Any other privacy level will trigger the Formula Firewall error.
  • Private: Selecting Private invariably leads to a firewall error.

Let’s say we want to set the privacy level to Organizational. We encounter a firewall error when selecting Continue and assigning Organizational to our data source. This happens because, while we set the web request to Organizational, the privacy level for the Excel file connection is still undefined.

To resolve this:

  1. Navigate to Data Source Settings.
  2. Locate the Excel file under Data sources in current file.
  3. Use Edit Permissions to set the Excel file’s privacy level to Organizational.
  4. Confirm your choice, then return to your query.
  5. Click Refresh Preview to update your queries.

After these steps, your queries should function without privacy-level-related errors.

Method 2: Ignoring privacy levels

Another method to prevent the Formula Firewall error is to ignore the privacy level settings. The formula firewall’s role as a gatekeeper between your data sources can slow down query performance. This feature is important for preventing data leaks, especially in query folding scenarios. However, you have an alternative if you’re dealing with non-sensitive data.

In Power BI Desktop, you can choose to ignore privacy levels. This may also help to improve the performance of your queries as any compatibility checks can be skipped.

In situations like our previous example, where you’re asked to set privacy levels, continuing without setting them triggers a specific pop-up message:

A screenshot of a computer error

Description automatically generated

Figure 15.41: Privacy levels notification – option to ignore privacy levels

You have the choice to select Ignore Privacy Levels. Opting for this allows your query to execute without running into privacy-related Formula Firewall errors.

To find this setting:

  • Navigate to File, then Options and Settings, and click Options.
  • Select Always ignore Privacy Level settings in the Privacy section.

That looks as follows:

A screenshot of a computer

Description automatically generated

Figure 15.42: The Options menu to configure Privacy Levels

Important: Using this setting might expose sensitive or confidential data. Additionally, this setting is only applicable in Power BI Desktop. The Power BI service does not respect this setting, and you’ll need to configure the appropriate privacy levels.

To conclude, this chapter has demonstrated how to address various Formula Firewall errors using specific strategies. We solved the partition referencing error by consolidating logic within a single query and creating a custom function. For the data privacy error, we either aligned data privacy levels or ignored privacy settings entirely.

However, it’s important to note that the examples provided in this chapter do not support refreshing in the Power BI service. This limitation arises because the example involves fetching a URL in one query and using it in another. Power Query’s engine requires that the base URL of a web request is explicitly visible to the M engine. Our dynamic referencing of the URL obscures this, leading to a dynamic queries error.

Despite this specific limitation for a web request, the techniques outlined in this chapter remain effective for resolving Formula Firewall situations. For instance, if these methods were applied to filter a database query, the query would successfully refresh in the Power BI service.

So far, you’ve learned how to deal with the formula firewall and what you can do to ensure query folding takes place for your queries. In the next section, we’ll look at other strategies to improve the performance of our queries.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image