Transform basics
Applying data transformations within the Power Query Editor can be a surprisingly simple thing to do. However, there are a few things to consider as we begin this process. The first is that there are multiple ways to solve a problem. As you work your way through this book, the authors have tried to show you the fastest and easiest methods of solving the problems that are presented, but these solutions will certainly not be the only ways to reach your goals.
The next thing you should understand is that every click you do inside the Power Query Editor is automatically converted into a formula language called M. Virtually all the basic transforms you will need can be accomplished by simply interacting with the Power Query Editor user interface, but for more complex business problems there is a good chance you may have to modify the M queries that are written for you by the editor. You will learn more about M later in this chapter.
Finally, the last important consideration to understand is that all transforms that are created within the editor are stored in the Query Settings pane under a section called Applied Steps. Why is this important to know? The Applied Steps section has many features, but here are some of the most critical to know for now:
- Deleting transforms: If you make a mistake and need to undo a step, you can click the Delete button next to a step.
- Modifying transforms: This can be done with any step that has a gear icon next to it.
- Changing the order of transforms: If you realize that it is better for one step to execute before another one, you can change the order of how the steps are executed.
- Selecting previous steps: Clicking on any step prior to the current one will allow you to see how your query results would change one step earlier in the process.
With this understanding, you will now get hands-on with applying several basic transforms inside the Power Query Editor. The goal of these first sets of examples is to get you comfortable with the Power Query Editor user interface before the more complex use cases are covered.
Use First Row as Headers
Organizing column names or headers is often an important first task when managing your dataset. Providing relevant column names makes many of the downstream processes, such as building reports, much easier. Often, column headers are automatically imported from your data source, but sometimes you may be working with a more unique data source that makes it difficult for Power BI to capture the column header information. This walkthrough will show how to deal with such a scenario:
- Launch Power BI Desktop, and click Get data on the Home ribbon.
- Choose Excel, then navigate to and select Open on the
Failed Bank List.xlsx
file that is available in the book source files. - In the Navigator window, select the table called Data, then choose Transform Data. When the Power Query Editor launches, you should notice that the column headers are not automatically imported. In fact, the column headers are in the first row of the data.
- To push the column names that are in the first row of data to the header section, select the transform called Use First Row as Headers from the Home ribbon as shown in Figure 2.2:
Figure 2.2: Leveraging the Use First Row as Headers transform
Once complete, you will see the first row of the dataset has been promoted to the column header area. This is a very common transform that you can expect to use often with flat files. Next, let's look at another commonly used transform, Remove Columns.
Remove Columns
Often, the data sources you will connect to will include many columns that are not necessary for the solution you are designing. It is important to remove these unnecessary columns from your dataset because these unused columns needlessly take up space inside your data model. There are several different methods for removing columns in the Power Query Editor. This example will show one of these methods using the same dataset from the previous demonstration:
- Multi-select (Ctrl + click) the column headers of the columns you wish to keep as part of your solution. In this scenario, select the columns Bank Name, City, ST, and Closing Date.
- With these four columns selected, right-click on any of the selected columns and choose Remove Other Columns, as shown in Figure 2.3:
Figure 2.3: Selecting the Remove Other Columns transform
Once this transform is completed, you should be left with only the columns you need.
Another popular method for removing columns is clicking the Choose Columns button on the Home ribbon of the Power Query Editor. This option provides a list of all the columns, and you can choose the columns you wish to keep or exclude.
You can also select the columns you wish to remove; right-click on one of the selected columns and click Remove. This seems like the more obvious method. However, this option is not as user-friendly in the long run because it does not provide an option to edit the transform in the Applied Steps section like the first two methods do.
With any data cleansing tool, data type manipulation is critical and can help save you from many headaches later in the development of your solution. In the next section, you will learn about how to change data types.
Change Type
Defining column data types properly early on in your data scrubbing process can help to ensure proper business rules can be applied and data is presented properly in reports. The Power Query Editor has various numeric, text, and date-time data types for you to choose from. In our current example, all of the data types were automatically interpreted correctly by the Power Query Editor, but let's look at where you could change this if necessary:
- Locate the data type indicator on the column header to the left of the column name.
- Click the data type icon, and a menu will open that allows you to choose whichever data type you desire, as shown in Figure 2.4:
Figure 2.4: Choosing a different data type
Another method you can use for changing column data types is to right-click on the column you wish to change, then select Change Type and choose the new data type. You should always be careful when changing data types to ensure your data supports the change. For instance, if you change a column data type to a Whole Number while it has letters stored in it, Power BI will produce an error.
If you want to change multiple column data types at once, you can multi-select the necessary columns, then select the new data type from the Data Type property on the Home ribbon.
Many of the transforms you will encounter in the future are contextually based on the column data types you are working with. For example, if you have a column that is a date, then you will be provided with special transforms that can only be executed against a date data type, such as extracting the month name from a date column.
Understanding how to properly set data types in Power BI is often the first step to using more exciting transforms. In the next section, you will learn how Power BI can read from an example you provide to automatically create transform rules.
Column From Examples
One option that can make complex data transformations seem simple is the feature called Add Column From Examples. Using Add Column From Examples, you can provide the Power Query Editor with a sample of what you would like your data to look like, and it can then automatically determine which transforms are required to accomplish your goal. Continuing with the same failed banks example, let's walk through a simple example of how to use this feature:
- Find and select the Add Column tab in the Power Query Editor ribbon.
- Select the Column From Examples button and, if prompted, choose From All Columns. This will launch a new Add Column From Examples interface:
Figure 2.5: Choosing the Column from Examples transform
- Our goal is to leverage this feature to combine the
City
andST
columns together. In the first empty cell, typeBarboursville, WV
and then hit Enter. In Figure 2.5 you will notice that the text you typed has automatically been translated into an M query and applied for every row in the dataset. - Once you click OK, the transform is finalized and automatically added to the overall M query that has been built through the user interface. The newly merged column will be added with the rest of your columns and you can optionally rename the column something more appropriate by double-clicking on the column header:
Figure 2.6: Adding Column from Examples
As you can see, the Add Column from Examples feature is great because you don't have to be an expert in which transforms are appropriate because Power BI will automatically choose them for you!
Sometimes, you may encounter scenarios where the Add Column From Examples feature needs more than one example to properly translate your example into an M query function that accomplishes your goal. If this happens, simply provide additional examples of how you would like the data to appear in different rows, and the Power Query Editor should adjust to account for outliers.
Now that you have learned some basic transforms, let's explore some more complex design patterns that are still used quite frequently.