Data modeling is the process of structuring and organizing data in a way that it can be easily analyzed and reported. Think of it like arranging books in a library. If you just threw all the books into a room, it would be hard to find what you need. But if you categorize them by genre, author, or publication date, it becomes much easier to locate a specific book.
Similarly, data modeling helps in organizing data so that you can easily derive insights from it.
Just as a business plan serves as a blueprint for a company, a data model acts as a blueprint for creating and visualizing the relationships between different datasets. This activity is known as data modeling.
It serves as the backbone for your visuals and calculations, allowing for more complex data analysis. A data model gives you a visual or conceptual view of how the datasets you are working with connect to produce the results or insights you need. Getting it right can be the difference between well-optimized data analytics and analytics filled with redundant data that offers little insight.
Microsoft offers the following definitions for a data model in Excel and Power BI:
- A data model allows you to integrate data from multiple tables, effectively building a relational data source inside an Excel workbook.
- Data modeling is the process of analyzing and defining all the different data types your business collects and produces, as well as the relationships between those bits of data. By using text, symbols, and diagrams, data modeling concepts create visual representations of data as it’s captured, stored, and used in your business. As your business determines how data is used and when the data modeling process becomes an exercise in understanding and clarifying your data requirements.
In Excel, a data model can help you connect to one or many tables and summarize the data with PivotTables.
Figure 1.1 – Comparing a one-table analysis to multiple-table analysis
Besides Excel, the concept also applies to other database management systems, such as Power BI, Access, Oracle, and so on.
With a data model, analyzing your data becomes easier because you can clearly define each dataset, the role it plays, and how it connects to other datasets to give you the results you need.
Comparing a one-table analysis to multiple-table analysis in Microsoft Excel
Often, we store our data in a range of cells in Microsoft Excel. Converting data stored in a range of cells into a table makes it easier for you to reference the dataset for calculations and further analysis using a PivotTable. This is called Structured Referencing. Standing in the range of cells, you can insert a table in Excel by going to Insert > Table in the ribbon or simply pressing Ctrl + T.
When data is stored in a table, simple aggregations such as SUM
, AVERAGE
, and COUNT
can be performed using the table name and the column. For instance, summing sales from a table named Table1
can be simply done using =SUM(Table1[Sales])
.
Data in the table can also be used in a PivotTable. This way, when the source data changes with the addition of more rows or columns, the PivotTables automatically update with the new data in the table when it is refreshed. This avoids the need to update the source reference of cells in the PivotTable.
Most Excel users tend to store all their data in one table for their analysis. This can be referred to as One-Table Analysis. There is nothing wrong with this approach. However, if the data you are working with grows and you have a situation where you need to add other tables to your analysis, it can become complex with just one table and a PivotTable.
Creating a data model in Power Pivot in Excel allows you to have access to multiple tables for your analysis without the need for complex lookup formulas. It improves performance and gives you a clear overview of how the tables relate.
Let’s now explore some of the key advantages of using a data model in Power Pivot.
Here are some reasons to use a data model:
- It gives you a broad overview of your datasets or tables. This ensures that all the tables and datasets you require in your model are accurately captured. Take a look at the following example data model for a sales report.
Figure 1.2 – A Diagram view of an example sales report data model
You’ll realize that even though there are several tables used in the creation of the final dashboard, the data model gives a good overview of how each table connects and contributes to delivering the final results.
- It is an abstract representation of the real-world situation you are analyzing. With the data model, you are in a good position to generate accurate measures and calculations for the KPIs in your report.
- The data model helps reduce the occurrence of redundant data. That is, the repetition of the same data at different points in your dataset. This helps improve performance when your data increases.
- The data model can also be a good blueprint for developing web or frontend applications for your dataset. For example, PowerApps, AppSheet, Caspio, and Squirrel are some of the applications that can benefit from a well-designed data model.
Most of these are low-code tools that use data models as a blueprint to create interactive apps for users. The data model then becomes an indirect way for developers to document the data that will be required to build these apps.
So far, we have covered what a data model is and the reasons you should consider using data models to structure datasets that are broken up into relational components and that need to be connected and properly visualized in order to effect the maximum efficiency and insight that is possible.
In the following section, we will look at some practical use cases of a data model. We will look at the case of an accountant and a salesperson and see how data models can help reduce the efforts and processes required in analyzing data.
Practical use cases for a data model
This section explores practical use cases of data models in various workplace scenarios.
The accountant
Mr. Owusu Yeboah is a chartered accountant. He enters his accounting records in the Journal tab, a table he has created in Microsoft Excel to record the Date, Description, Amount, Debit Account, and Credit Account of all transactions.
Figure 1.3 – Journal showing accounting entries
In another worksheet named COA, he has a table containing his chart of accounts with account codes, sorted to classify the various accounts into assets, liabilities, equity, revenue, and expenses. The other columns in his chart of accounts describe how each account has to be treated to produce a monthly and an annual financial statement.
Figure 1.4 – Sample chart of accounts
For Mr. Owusu Yeboah to determine the ins and outs of each account or create a trial balance, he would need to use a lot of lookup formulas to connect the two tables. Aside from this, when new data is added to the tables, he must manually update all his workings to capture the new entries. Using Excel tables to store data is one way to avoid manually updating calculations when your data changes.
How does a data model help in this situation?
Using a data model, Mr. Owusu can upload and connect the two tables using common columns. These common columns are used to establish a relationship between the tables and make it possible to create a data model. He can then create an extra calendar table to help him create a month-on-month or annual financial statement.
A calendar table in Excel is a special table with a series of sequential dates that helps you keep track of dates and times in your data. It’s great for looking at things such as sales or expenses by day, month, or year. If your data is missing information for certain dates, a calendar table makes it easy to spot those gaps so you can fill them in. This ensures you’re not missing out on important details when making decisions.
In addition to helping Mr. Owusu Yeboah sort and analyze his data over time, a calendar table makes sure that all the date information in his various tables lines up correctly. This helps him avoid mistakes and makes it easier to combine different sets of data. It also lets Excel perform more advanced calculations for him, such as figuring out his total sales for each month or calculating averages over specific time periods.
His data model will look something like the following screenshot:
Figure 1.5 – A screenshot of a data model with accounting data
This will help him easily capture new information in the journal and chart of accounts and create a dynamic financial statement for his users.
The salesperson
Ferdinand Attobra is a sales executive with Finex online electronics shop. Daily, he is required to create a report that captures top-performing products, branches, and customers to his supervisors.
Figure 1.6 – Sales transactions
To create his report, he downloads four datasets from his sales software:
- Transactions: This captures all the revenue as well as the cost of sales per transaction. The table also has fields that identify the customer, product, and store information related to each transaction. This is represented by Customer ID, Product ID, and Store ID.
Apart from the Transactions table, there are three other tables he uses to look up the details of each customer, product, or store that appeared in the Transactions table.
Figure 1.7 – Sample lookup tables
- Customers: This table has the unique details of all the shop’s customers’ IDs, their names, and their customer segments.
- Products: This table contains the unique details of the product IDs, their categories, sub-categories, and their names.
- Location: This table contains the details of each store ID, the city, region, and country.
The challenge Ferdinand faces in creating his report is how he can use the various IDs stored in the Transactions table to look up the customer, product, and store involved in each transaction.
How does a data model help in this situation?
Using a data model, Ferdi can upload and connect the Customers, Products, and Locations tables to the Transactions tables using the Customer ID, Product ID, and City columns respectively. This is where a calendar table, created as supplemental data but very useful, would get connected as well. He will then use this model to generate his daily reports to analyze sales by Product, Geography, Customer, and Date.
The model will look like the following screenshot:
Figure 1.8 – A screenshot of a data model showing sales data
From the two case studies, we can appreciate that using Excel’s data model can help us overcome some of the typical challenges in our routine office work.
Excel’s data model allows you to integrate data from multiple sources in an efficient manner. This is what is called an Entity Relationship Diagram (ERD).
Figure 1.9 – Sample ERD for a sales report in Excel
Apart from this key advantage, the data model can also do the following:
- Store and analyze data beyond Microsoft Excel’s 1-million-row capacity. This brings a whole new capability to regular Excel.
- Create more powerful formulas to help you analyze your data more efficiently.
- Work together with tools such as Power Query to transform, shape your data, and maintain a dynamic connection to your data sources.
In the next topic, we will dive into the main tool for data modeling and explore some best practices to help you get more insights from your datasets.