Diagnosing Queries
The Power Query M engine is an extremely powerful and fast data transformation and data preparation engine used across an array of products, including:
- Excel for Windows
- Excel for Mac
- Power BI
- Power Apps
- Power Automate
- Azure Data Factory
- SQL Server Integration Services
- SQL Server Analysis Services
- Dynamics 365 Customer Insights
While both fast and powerful, there are times when you may find that a particular query is not as performant as desired. In these instances, Query Diagnostics can help you pinpoint problematic expressions and better understand what Power Query is doing in order to identify areas for query optimization. This recipe demonstrates how the user can use Query Diagnostics to troubleshoot a query and identify how the query might be optimized to be more performant.
Getting ready
To get ready, import the FactCallCenter
table from the AdventureWorksDW2019 database by doing the following:
- Open Power BI Desktop and choose Transform data from the ribbon of the Home tab to open the Power Query Editor.
- Create an Import mode data source query called
AdWorksDW
. This query should be similar to the following:let Source = Sql.Database("localhost\MSSQLSERVERDEV", "AdventureWorksDW2019") in Source
- Isolate this query in a query group called Data Sources.
- Right-click
AdWorksDW
and choose Reference; select the FactCurrencyRate table in the data preview area, and rename the AdWorksDW query to FactCurrencyRate. - Move the FactCurrencyRate query to the Other Queries group.
- In the Power Query Editor, select the Tools tab and then Diagnostic Options.
- Ensure that the radio button for Enable in Query Editor (does not require running as admin) is selected and check all the boxes under the Diagnostics Level and Additional Diagnostics sub-headings.
Figure 2.46: Query Diagnostic options
For additional details on performing these steps, see the Managing Queries and Data Sources recipe in this chapter.
How to Diagnose Queries
To implement this recipe, perform the following steps:
- Select the FactCurrencyRate query, open Advanced Editor, and modify the query to calculate the daily change in currency rates.
let Source = AdWorksDW, dbo_FactCurrencyRate = Source{[Schema="dbo",Item="FactCurrencyRate"]}[Data], Sort = Table.Sort( dbo_FactCurrencyRate, {{"CurrencyKey", Order.Ascending}, {"DateKey", Order.Ascending}} ), Index = Table.TransformColumnTypes( Table.AddIndexColumn(Sort, "Row Index", 1, 1), {{ "Row Index", Int64.Type }} ), PrevIndex = Table.TransformColumnTypes( Table.AddIndexColumn(Index, "Prev Index", 0, 1), {{ "Prev Index", Int64.Type }} ), SelfJoin = Table.NestedJoin( PrevIndex, {"Prev Index"}, PrevIndex, {"Row Index"}, "NewColumn", JoinKind.LeftOuter ), PrevColumns = Table.ExpandTableColumn( SelfJoin, "NewColumn", {"EndOfDayRate", "CurrencyKey"}, {"PrevRate", "PrevKey"} ), AddChange = Table.AddColumn( PrevColumns, "Daily Change", each if [CurrencyKey] = [PrevKey] then [EndOfDayRate] - [PrevRate] else null, type decimal ), SelectColumns = Table.SelectColumns( AddChange, {"CurrencyKey", "EndOfDayRate", "Date", "Daily Change"} ) in SelectColumns
- You may have to wait several minutes for the preview data to display. To investigate what is going on, click on the Tools tab and then Start Diagnostics.
- Click back on the Home tab and click Refresh Preview.
- Wait for the preview data to refresh and then click back on the Tools tab and Stop Diagnostics.
Figure 2.47: Diagnostics processing display
- When the Diagnostics processing completes, you will have a new query group called Diagnostics containing four queries for Diagnostics_Counters, Diagnostic_Detailed, Diagnostic_Aggregated, and Diagnostic_Partitions. Each of these queries is suffixed with a date and time stamp of when the diagnostics were run, and each has loading disabled.
Figure 2.48: Diagnostics queries
- Click on the Diagnostics_Aggregated query. Select the Step, Category, Start Time, End Time, Exclusive Duration (%), and Exclusive Duration columns and remove the other columns.
- Sort the Exclusive Duration column in descending order.
Figure 49: Diagnostic results
- Look for large jumps in the Exclusive Duration column; we see two such jumps between lines 3 and 4, and 4 and 5, with the larger jump being between lines 3 and 4. Also, note the high Exclusive Duration (%) value for row 3.
- Look at the Start Time and End Time columns for row 3 and note that this operation took 4 minutes and 26 seconds to complete. The Step column shows that the query step for row 3 is the PrevColumns step.
- Having zeroed in on the PrevColumns step of the query, click on the Diagnostics_Detailed query and filter the Step column to just show PrevColumns. If the PrevColumns value does not appear in the filtering dialog, use the Load More link or edit the query in Advanced Editor to add the Table.SelectRows expression manually:
Table.SelectRows(#"Changed Type", each [Step] = "PrevColumns")
. - Sort the query by the Exclusive Duration (%) column in Descending order.
- Note that the Operation column for the top row is DbDataReader.
At this point, it should be evident that this query would likely be better done in the source database system as a view or stored procedure.
How it works
The FactCurrencyRate query joins the base table with itself in order to compare subsequent rows of data. The goal is to compare the values of one row with the previous row in order to compute a value for a change in currency rate between days. This is done by sorting the table by first the currency and then the date using the Table.Sort
function. Next, the Table
.AddIndexColumn
function is used twice, once to add an index column starting from 1 (Row Index column) and a second time to add an index column starting from 0 (Prev Index column). These two index columns are then used in a Table
.NestedJoin
function to join the table to itself. The Table
.ExpandTableColumn
function is used to expose the previous row's CurrencyKey and EndOfDayRate columns of the previous row. This information can then be used to create the Daily Change column using the Table
.AddColumns
function, and finally, only the essential columns are selected using the Table
.SelectColumns
function.
When you start Query Diagnostics, query diagnostic information is logged to JSON and CSV files stored in the application's directory. These files can be located by looking at the Source step for aggregated, detailed, and partition diagnostic queries and the CsvFiles step of the Counters diagnostic query. For the Power BI Desktop Store App, the path should be similar to the following:
C:\Users\[user]\Microsoft\Power BI Desktop Store App\Traces\Diagnostics
These files record diagnostic information performed by the query engine during processes such as refreshing the preview data. Stopping Query Diagnostics ends logging of the diagnostic data and generates queries for each enabled Query Diagnostics feature: Aggregated, Detailed, Performance counters, and Data privacy partitions. As diagnostic logging is costly in terms of performance and system resources, it is recommended to only use Query Diagnostics when troubleshooting a query's performance. In addition, only enable the minimal amount of diagnostic logging required to identify the problem—for example, often just starting with the Aggregated diagnostic data is enough to identify the problematic step(s).
There's more...
There is also a Diagnose Step feature available for Query Diagnostics. To see how Diagnose Step can be used, follow these steps:
- Open the Diagnostic Options from the ribbon of the Tools tab and uncheck the Performance counters and Data privacy partitions.
- In the Query Settings pane, select the SelfJoin step. You can now either right-click the SelfJoin step and select Diagnose or select Diagnose Step in the ribbon of the Tools tab.
- Once complete, two additional queries are added to the Diagnostics query group, FactCurrencyRate_SelfJoin_Detailed and FactCurrencyRate_SelfJoin_Aggregated, each suffixed with a date and time stamp.
- Click on the FactCurrencyRate_SelfJoin_Aggregated query.
- Add an Index column.
- Sort the Exclusive Duration column in descending order.
- Here we can see that the most expensive operation occurs early on in the process, at index 15 out of 3,000+ rows, and appears to be the initial selection of columns with the Data Source Query being the following:
select [_].[CurrencyKey], [_].[DateKey], [_].[AverageRate], [_].[EndOfDayRate], [_].[Date] from [dbo].[FactCurrencyRate] as [_] order by [_].[CurrencyKey], [_].[DateKey]
It should be evident that diagnosing a single step of a query is faster and consumes fewer system resources than analyzing the entire query. Thus, it is recommended that you run only Aggregated diagnostics over an entire query to identify problematic steps, and then run Detailed diagnostics on those steps individually. If you look at the FactCurrencyRate_SelfJoin_Detailed, the most expensive operation is on line 60, and it is the DbDataReader operation. The SQL statement identified is actually on line 59 with the operation Execute Query. Thus, we can conclude that the most expensive operation performed was not in executing the query, but rather reading the data generated by the query.
See Also
- Recording Query Diagnostics in Power BI: http://bit.ly/2ND2yqF
- Query Diagnostics: http://bit.ly/3lFz8ET
- Reading query diagnostics: http://bit.ly/3s8IFXA