Leveraging R
R is a very powerful scripting language that is primarily used for advanced analytics tools, but also has several integration points within Power BI. One such integration is the ability to apply business rules to your data with the R language. Why is that important? Well, with this capability you can extend beyond the limits of the Power Query Editor and call functions and libraries from R to do things that would not normally be possible. In the next two sections, you will explore how to set up your machine to leverage R within Power BI and then walk through an example of using an R script transform.
There are many additional books and references you can read to learn more about the R scripting language, but for the purposes of this book, our goal is to inform you of what is possible when R and Power BI are combined.
Installation and configuration
To use R within Power BI, you must first install an R distribution for you to run and execute scripts against. In this book, we will leverage Microsoft's distribution, Microsoft R Open. It is an open source project and free for anyone to use. Once Microsoft R Open has been installed, you can then configure Power BI to recognize the home directory where R libraries may be installed. Let's walk through these setup steps together:
- Navigate to the website https://mran.microsoft.com/download/ to download and install Microsoft R Open.
- For the purposes of our example, you will select Download next to Windows.
- Once the download has completed, run the installation and accept all default settings and user agreements.
- Next, launch a new instance of Power BI Desktop to set up the R integration with Power BI. Click the menu options File | Options and settings | Options.
- Choose the R scripting section and ensure that the Detected R home directories property is filled with the R instance you just installed, as shown in Figure 2.23:
Figure 2.23: Mapping the R home directories in Power BI
- Once this is completed, click OK.
With this setup now complete, let's see how we can take advantage of R within Power BI.
The R script transform
With the R distribution now installed and configured to integrate with Power BI, you are now ready to see what's possible with these new capabilities. In this example, you will be looking at data from the European stock market. The problem with this dataset, which calls for it to be corrected with R, is that the file provided to you has missing values for certain days. So, to get a more accurate reading of the stock market, you will use an R package called MICE
to impute the missing values:
- Before beginning in Power BI, you should ensure that the
MICE
library is installed and available in the R distribution you set up in the last section. To do this, launch Microsoft R Open from your device. This is the basic RGui that was installed for you to run R scripts with.For many developers, the preferred method for writing
R
scripts is a free open source tool called RStudio. RStudio includes a code editor, debugging, and visualization tools that many find easier to work with. You can download RStudio from https://www.rstudio.com/. - Type the following script in the R Console window, and then hit Enter:
install.packages("mice")
This input is illustrated in the following screenshot:
Figure 2.24: Running the library install in RGui
- You can close the R Console window and return to Power BI Desktop after it returns an output like the following:
package 'mice' successfully unpacked and MD5 sums checked.
- In Power BI Desktop, start by connecting to the required data source called
EuStockMarkets_NA.csv
from the book source files. Once you connect to the file, click Transform Data to launch the Power Query Editor. - You will notice that there are a few days missing values in the SMI (Stock Market Index) column. We would like to replace values that show NA with approximate values using an R script. Go to the Transform ribbon, and select the Run R Script button on the far right.
- Use the following R script to call the
MICE
library that you recently installed to detect what the missing values in this dataset should be:# 'dataset' holds the input data for this script library(mice) tempData <- mice(dataset,m=1,maxit=50,meth='pmm',seed=100) completedData <- complete(tempData,1) output <- dataset output$completedValues <- completedData$"SMI missing values"
- Click OK. If you are prompted with a warning indicating Information is required about data privacy click Continue.
- Next, click on the hyperlink on the table value next to the
completedData
row to see the result of the newly implemented transform for detecting missing values.
This new output has replaced the missing values with new values that were detected based on the algorithm used within the R script. To now build a set of report visuals on this example, you can click Close & Apply on the Home ribbon.
This is just one simple way that R can be used with Power BI. You should note that in addition to using R as a transform, it can also be used as a data source and as a visual within Power BI.
While this book highlights the programming language R to extend the capabilities of Power BI, some might prefer Python. Python is another programming language that allows for extensibility into Power BI to create new data connectors, transforms, and visuals. So, should you choose R or Python? That depends on which you are more comfortable with. If you have already spent time learning Python
, then stick with that! In the next section of this chapter, you will learn about Power BI's AI integration features, which give you the ability to call on components of Azure Cognitive Services with the Power Query Editor.