Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

Streamlining Business Insights with Dataverse & Fabric

Save for later
  • 9 min read
  • 11 Jan 2024

article-image

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!

Introduction

AI and data analysis are critical for businesses to gain insights, optimize processes, and deliver value to their customers. However, working with data can be challenging, especially when it comes from various sources, formats, and systems. That’s why Microsoft offers two powerful platforms that can help you simplify your data and analytics needs: Microsoft Dataverse and Microsoft Fabric.

Microsoft Dataverse is a smart, secure, and scalable low-code data platform that lets you create and run applications, flows, and intelligent agents with common tables, extended attributes, and semantic meanings. 

Microsoft Fabric is a unified platform that can meet your organization’s data and analytics needs. It integrates data lake, data engineering, and data integration from Power BI, Azure Synapse, and Azure Data Factory into a single SaaS experience enabling a single layer for managing data across the enterprise as well as enabling data scientists to apply models using data from these sources effectively.

In this article, we will learn how to consume data from dataverse, prepare our data for ML experiments using other data sources, apply a machine learning model, and finally create an outcome that can be again by users of dataverse.

Business Scenario

In this blog, we will try to solve problems for retail merchants selling products such as wine, fruits , meat , fish, etc through web and in-store experience. the marketing department generates monthly sales offers but also wants to ensure these offers land to customers based on the revenue they generate throughout the year. While it is easy for existing customers with long-term relations, but many new customers would remain outside of such business segmentation for 12 months which makes it harder for marketing to engage. So we will help the marketing team to get predicted revenue available for all customers so they can plan the offers and realize the true potential with each customer.

Getting the environment ready

Let's understand where would be our data and how we can prepare for the use case. The marketing team in this case uses Microsoft dynamics Marketing solution . so customer information is stored in dataverse and in a table called “Contact”. The transaction information is stored in an e-commerce solution hosted in Azure and using Azure SQL as data storage. To replicate the environment we need to perform the following actions

1. Download and deploy the “contact table solution in the targeted dataverse environment. follow the steps defined at Microsoft Learn. The solution will deploy changes to the core contact table and a power automation flow that will be required to populate the data.

2. Download “DataFile_Master” and place the file in one drive for business in the same environment as dataverse.

3. Open power to automate flow “ArticleFlow_loadCustomerData” deployed through the solution file and update the file location at the following action and save.

streamlining-business-insights-with-dataverse-fabric-img-0

Figure 1 - List action from Power Automate Cloud Flow

4. Run the flow so data required for simulation is uploaded in the contact table. After the execution of the flow, you should have the following data rows added to the contact table.

streamlining-business-insights-with-dataverse-fabric-img-1

Figure 2 - Data Created in Contact Table of Dataverse

5. Let's set up a link between dataverse and Microsoft Fabric. This can be initiated by launching the Power App maker view and clicking on Analyze> Link to Microsoft Fabric link. Please refer to the documentation if you are performing this action for the first time. This will allow access to dataverse tables in Microsoft Fabric for all data science activities without any ETL needs.

6. The last step of setting up the environment is to bring the transaction summary information to our Fabric. Here we will import the information in the Microsoft Fabric workspace created with dataverse synchronization. First, we will launch Lakehouse from workspace , and choose the one created with dataverse synchronization. The name would start from dataverse_<environment> where the environment is the name of the environment of dataverse.

streamlining-business-insights-with-dataverse-fabric-img-2

Figure 3- Microsoft Fabric Auto created Workspace and Lakehouse for dataverse

7. This will bring you to the Lakehouse Explorer view where you can view all the dataverse tables. Now from the menu select “Get Data > Upload files” to bring our transaction summary file to Lakehouse. In a real scenario , this can also be done through many features offered by data factory  

streamlining-business-insights-with-dataverse-fabric-img-3

Figure 4 - Action to upload transaction file in Lakehouse Explorer

8. Download a ready file “RetailStoreTxnSummary_01” and select the file in a prompt. This will upload the file under the file folder of Lakehouse.

streamlining-business-insights-with-dataverse-fabric-img-4

9. Click on three dots with the file and choose load to table > New table. This will show a prompt and ask for the table name , let's keep it the same as the file name. this will create a table with summary information.

streamlining-business-insights-with-dataverse-fabric-img-5

Figure 5 - Action to convert transaction summary file to Delta table

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at £16.99/month. Cancel anytime

We have completed the steps of data preparation and started preparing for the machine learning experiment. With the completion of the above action, our lakehouse now has a contact table, mirrored from dataverse and retailstoretxnsummary_01table created from CSV file. We can now move to the next step of preparing data for the machine learning model.

Preparing data for machine learning model

As discussed in our business problem, we will be using the data to create a linear regression model. Our intention is to use the customer information and transactional information to predict expected revenue from customers in a 12-month cycle.  Now that we have our data available in fabric, we can use Notebook at Fabric to create a joint table that will be used as a source of testing and training the model.  we will use the following code snippet to join the two tables and save them in the temp test_train_data table.  You can download the ready notebook file “01-Createtable” and upload using Open Notebook > Existing Notebook or create a new one from Open Notebook > New Notebook.

streamlining-business-insights-with-dataverse-fabric-img-6

Figure 6 - a snapshot of the notebook "01-CreateTable"

Creating and registering ML model

In this step we will now use the data in our table “tbl_temp_traintestdata” and build a linear regression model.  we are using linear regression as it is simple and easy to learn as well as suitable for numerical predictions. We will use the following PySpark code, you can also download the ready file 02-CreateMLMode”.

streamlining-business-insights-with-dataverse-fabric-img-7

streamlining-business-insights-with-dataverse-fabric-img-8

streamlining-business-insights-with-dataverse-fabric-img-9

streamlining-business-insights-with-dataverse-fabric-img-10
streamlining-business-insights-with-dataverse-fabric-img-11

Analyse the model performance

Let’s quickly assess the model performance using 3 evaluation criteria. firstly we use RMSE (Root Mean Squared Error), the training data is approximately 9535.31, and for the test data, it’s approximately 8678.23. The lower the RMSE, the better the model’s predictions. It’s worth noting that the RMSE is sensitive to outliers, meaning that a few large errors can significantly increase the RMSE. Second, we used R2 (R-squared). the R2 for the training data is approximately 0.78 (or 78%), and for the test data, it’s approximately 0.82 (or 82%). This means that about 78% of the variability in the training data and 82% of the variability in the test data can be explained by the model. This model seems to be performing reasonably well, as indicated by the relatively high R2 values. However, the RMSE values suggest that there may be some large errors, possibly due to outliers in the data. Lastly, we analyzed the coefficients to identify the importance of the selected features , the results were the following:

streamlining-business-insights-with-dataverse-fabric-img-12

Figure 7 - Coefficients of features

We identified that MntDairyProds (42.06%)  has the highest coefficient, meaning it has the most significant influence on the model’s predictions. A change in this feature will have a substantial impact on the predicted crffa_revenue. However, MntBeverageProds (0.05%) and others under 0 indicate they have less influence on the model’s predictions. However, they still contribute to the model’s ability to accurately predict crffa_revenue.

Since data cleansing is not in our blog scope, we will accept these results and move to the next step which is using this model to perform batch prediction on new customer profiles.

Using the model for batch prediction

We will now use the registered model to batch-predict new customer profiles. We will fetch the rows with the current revenue is zero and pass the dataset for prediction. Finally, we will save the records so we can post back to dataverse. A ready file “03– UseModelandSavePredictions” can be downloaded or the following code snippet can be copied.

streamlining-business-insights-with-dataverse-fabric-img-13
streamlining-business-insights-with-dataverse-fabric-img-14
streamlining-business-insights-with-dataverse-fabric-img-15

After execution of this code, we will have a new table in oneLake “” with predicted revenue. This table can be then synchronized back to dataverse using virtual table configuration and used in the table.

streamlining-business-insights-with-dataverse-fabric-img-16

Figure 8 - Output snapshot from OneLake SQL analytics endpoint

There are detailed instructions are available at Microsoft Learn that can make the prediction table available in dataverse so the marketing team can use the predicted values for segmentation or personalized offerings

streamlining-business-insights-with-dataverse-fabric-img-17

Figure 9 virtual table configured in Dataverse.

Conclusion

In this blog, we have learned how to use Microsoft Fabric to create a regression model for predicting customer revenue and leveraging data from Dataverse and other sources. We have also seen how to consume the model through simple PySpark code. Moreover, we have explored how to integrate the model with Power Platform, opening up many new possibilities.

This is just one example of how this platform can accelerate the work together to enable business scenarios that traditionally would have required data scientist and many days of engineering. There are many more possibilities and benefits of using these tools, such as scalability, security, governance, and collaboration. We encourage you to experiment and explore both dataverse and fabric to discover new ways of solving your business problems and creating value for your customers.

Author Bio

Mohammad Adeel Khan is a Senior Technical Specialist at Microsoft. A seasoned professional with over 19 years of experience with various technologies and digital transformation projects. At work , he engages with enterprise customers across geographies and helps them accelerate digital transformation using Microsoft Business Applications , Data, and AI solutions. In his spare time, he collaborates with like-minded and helps solve business problems for Nonprofit organizations using technology.  

Adeel is also known for his unique approach to learning and development. During the COVID-19 lockdown, he introduced his 10-year-old twins to Microsoft Learn. The twins not only developed their first Microsoft Power Platform app—an expense tracker—but also became one of the youngest twins to earn the Microsoft Power Platform certification.