Microsoft Power BI Desktop contains a rich set of data source connectors and transformation capabilities that support the integration and enhancement of source data. These features are all driven by a powerful functional language and query engine, M, which leverages source system resources when possible and can greatly extend the scope and robustness of the data retrieval process beyond the possibilities of the standard query editor interface alone. As with almost all BI projects, the design and development of the data access and retrieval process has great implications for the analytical value, scalability, and sustainability of the overall Power BI solution.
[box type="note" align="" class="" width=""]Our article is an excerpt from the book Microsoft Power BI Cookbook, written by Brett Powell. This book shows how to leverage Microsoft Power BI and the development tools to create better data driven analytics and visualizations. [/box]
In this article, we dive into Power BI Desktop's Get Data experience and go through the process of establishing and managing data source connections and queries. Examples are provided of using the Query Editor interface and the M language directly to construct and refine queries to meet common data transformation and cleansing needs. In practice and as per the examples, a combination of both tools is recommended to aid the query development process.
Every time you click on a button to connect to any of Power BI Desktop's supported data sources or apply any transformation to a data source object, such as changing a column's data type, one or multiple M expressions are created reflecting your choices. These M expressions are automatically written to dedicated M documents and, if saved, are stored within the Power BI Desktop file as Queries. M is a functional programming language like F#, and it's important that Power BI developers become familiar with analyzing and later writing and enhancing the M code that supports their queries.
A navigation window will appear, with the different objects and schemas of the database. Select the DimGeography table from the Navigation window and click on Edit.
Figure 2: Filtering for United States only in the Query Editor
At this point, a preview of the filtered table is exposed in the Query Editor and the Query Settings pane displays the previous steps.
Figure 3: The Query Settings pane in the Query Editor
Figure 4: The SQL.Database() function created for the Source step
Figure 5: The metadata record created for the Navigation step
M is a functional language, and it can be useful to think of query evaluation in M as similar to Excel spreadsheet formulas in which multiple formulas can reference each other. The M engine can determine which expressions are required by the final expression to return and evaluate only those expressions.
Configuring Power BI Development Tools, the display setting for both the Query Settings pane and the Formula bar should be enabled as GLOBAL | Query Editor options.
Figure 6: Global layout options for the Query Editor
Figure 7: Property settings of the View tab in the Query Editor
Given its importance to the query development process, the Advanced Editor dialog is exposed on both the Home and View tabs of the Query Editor.
It's recommended to use the Query Editor when getting started with a new query and when learning the M language. After several steps have been applied, use the Advanced Editor to review and optionally enhance or customize the M query. As a rich, functional programming language, there are many M functions and optional parameters not exposed via the Query Editor; going beyond the limits of the Query Editor enables more robust data retrieval and integration processes.
Figure 8: The Home tab of the Query Editor
Figure 9: The Advanced Editor view of the DimGeography query
When developing retrieval processes for Power BI models, consider these common ETL questions:
M queries are not intended as a substitute for the workloads typically handled by enterprise ETL tools such as SSIS or Informatica. However, just as BI professionals would carefully review the logic and test the performance of SQL stored procedures and ETL packages supporting their various cubes and reports environment, they should also review the M queries created to support Power BI models and reports.
Two of the top performance and scalability features of M's engine are Query Folding and Lazy Evaluation. If possible, the M queries developed in Power BI Desktop are converted (folded) into SQL statements and passed to source systems for processing. M can also reduce the required resources for a given query by ignoring any unnecessary or redundant steps (variables).
M is a case-sensitive language. This includes referencing variables in M expressions (RenameColumns versus Renamecolumns) as well as the values in M queries. For example, the values "Apple" and "apple" are considered unique values in an M query; the Table.Distinct() function will not remove rows for one of the values. Variable names in M expressions cannot have spaces without a hash sign and double quotes. Per Figure 10, when the Query Editor graphical interface is used to create M queries this syntax is applied automatically, along with a name describing the M transformation applied. Applying short, descriptive variable names (with no spaces) improves the readability of M queries.
The query from this recipe was "folded" into the following SQL statement and sent to the ATLAS server for processing.
Figure 10: The SQL statement generated from the DimGeography M query
Right-click on the Filtered Rows step and select View Native Query to access the Native Query window from Figure 11:
Figure 11: View Native Query in Query Settings
Finding and revising queries that are not being folded to source systems is a top technique for enhancing large Power BI datasets. See the Pushing Query Processing Back to Source Systems recipe of Chapter 11, Enhancing and Optimizing Existing Power BI Solutions for an example of this process.
Figure 12: Revised query that ignores Filtered Rows step to evaluate Canada only
View Native Query (Figure 12) is not available given this revision, but a SQL Profiler trace against the source database server (and a refresh of the M query) confirms that CanadaOnly was the only SQL query passed to the source database.
Figure 13: Capturing the SQL statement passed to the server via SQL Server Profiler trace
Figure 14: Providing a user defined native SQL query
We have discussed effective techniques for accessing and retrieving data using Microsoft Power BI. Do check out this book Microsoft Power BI Cookbook for more information on using Microsoft power BI for data analysis and visualization.
Expert Interview: Unlocking the secrets of Microsoft Power BI
Tutorial: Building a Microsoft Power BI Data Model
Expert Insights:Ride the third wave of BI with Microsoft Power BI