There are many BI tools and programming languages that will help you along the way to visualize and explore data in many ways. In Chapter 5, we will discuss some of the tools out there in the market. In this section, we will discuss how to use them. Regardless of the tool type, there are data manipulation techniques that every data tool should do as a minimum.
As we define each of these, we will discuss the tool-less alternative and find the cost-benefit of using a BI tool to perform each of these. There are two basic concepts we need to understand before moving on – metrics and attributes. These can also be unveiled by asking What do we want to measure? and How do we want to describe such a measurement?
For example, if we measure sales by region, what are we measuring? Sales. What describes such measurements? Regions. This basic statement, although simplistic, is the basis of every analysis and exploration. Many complex analyses will derive from asking such questions. With additions, filters, and other types of calculations, you can make a more robust report.
Another basic term we need to understand is granularity. Granularity in Business Intelligence (BI) refers to the level of detail or the degree of aggregation of the data that is analyzed. It refers to the size of the individual units of data that are examined.
For example, if you analyze sales data, the granularity can be at the level of individual transactions or a higher level of aggregation, such as monthly sales, quarterly sales, or annual sales. The level of granularity can have a significant impact on the insights that can be derived from the data. Granularity may be accompanied by a temporal dimension. It is important to notice the difference because repeating a record over time doesn’t mean the granularity changes if the analysis is made using different measures of time (i.e., year, month, or day).
Here’s a mental exercise on granularity.
If we have a dataset that describes employees' performance, each record could represent how many sales we had in a department by month. Each row has a sales amount, a department ID, and the number of employees.
Granularity level: department
Figure 2.2 – A table depicting the granularity at the department level by month. In this table, granularity shows when a row is unique, and there are no two rows with the same department ID in a given month
The temporal unit of measurement here is month, so we can expect departments to repeat or not every month. Different model techniques may repeat a department with zero sales in a given month, but this will totally depend on the preferences of the design. Zero sales could mean no sales, but on some occasions, this may represent sales that add up to zero for many reasons (discounts, gifts, promotions, or devolutions).
Now, if we want to analyze on a more detailed level, we could change the level of granularity to the employee; every record would have the following: Employee ID, Department ID, Sales Amount, and Date.
Granularity level: employee
Figure 2.3 – A table depicting granularity at the employee level. In this table, granularity shows when a row is unique. There are no two rows with the same employee ID on a given date, and sales are added up and stored daily
The temporal unit now is Date
, a day in the life of a salesperson. Again, many experts in dimensional modeling may offer different ways to represent this; some may require each individual sale during a day, and some would show the summary of sales by an employee during that day.
Can we go one level further down? Certainly – let’s imagine now records storing sales of every product by every employee on a given date. This is exactly the kind of analysis we could find in master-detail modeling. This is where the header of the sale may have a summary of what was sold and the detail level contains each individual product, itemized with the proper measurements of such sale – for example, quantity, stock-keeping unit (SKU), and unit of measurement.
In BI, master-detail modeling is a data modeling technique used to represent the hierarchical relationships between data entities. It involves creating a relationship between two tables or datasets, where one table is the master and the other is the detail.
The master table contains the main data elements, while the detail table contains the related data elements that are associated with the main data elements. For example, in a sales analysis, the master table may contain information about sales transactions, while the detail table may contain information about the products sold in each transaction.
The master-detail relationship is typically created by defining a primary key in the master table and a foreign key in the detail table. The foreign key is used to link the detailed data to the corresponding master data.
Master-detail relationships are often used in reporting and analysis to drill down from summary information to more detailed information. For example, a report may show total sales by product category, with the ability to drill down to see the sales by individual products within each category.
Master-detail relationships are also used in data visualization tools to create interactive dashboards and reports. By using the master-detail relationship, a user can interactively explore the data, filter it, and drill down to view more detailed information as needed.
Master-detail relationships are an important data modeling technique in BI, allowing for the flexible and powerful analysis and reporting of hierarchical data structures.
As you can see, granularity can go up and aggregate detailed data, but it definitely can go down to levels that you can find deep inside an online transactional processing (OLTP) when your customers require it. A high level of granularity means that data is analyzed at a more detailed level, which can provide more specific insights but may require more time and effort to analyze. On the other hand, a lower level of granularity means that the data is analyzed at a more summarized level, which can provide a broader view of the data but may miss out on important details.
The choice of granularity depends on the specific business problem and the goals of the analysis. Generally, the level of granularity should be chosen based on the level of detail required to support the business decisions that need to be made. Data modelers and architects may decide to specify a low granularity level even though reports and dashboards are shown at the department, organization, or even company level. Part of their job involves building future-proof data structures; hence, they may find it advantageous to define a fine granularity level so that other analyses become supportable when the business requires it. However, modeling techniques are out of the scope of this book.
By understanding metrics, attributes or dimensions, and granularity, we can extrapolate to other concepts that follow these three, as they represent the lowest level in terms of “data hierarchy.” Now, we go up, and aggregations are at the next level. In BI, aggregations refer to the process of summarizing or grouping data into higher-level units, such as totals, averages, counts, or percentages. The purpose of aggregation is to make data more manageable, understandable, and actionable.
Aggregations are used to reduce the volume of data to be analyzed so that it can be processed more efficiently and effectively. By summarizing data into higher-level units, it becomes easier to identify patterns, trends, and outliers in the data.
For example, in a sales analysis, aggregating data by month or by product category can help identify which products are selling well and which ones are not. Aggregations can also be used to compare performance over time or across different regions or customer segments.
Aggregations can be performed at different levels of detail, depending on the business needs and the data available. Aggregations can be pre-calculated and stored in a data warehouse, or they can be calculated on the fly using BI tools and technologies.
A BI tool should be able to create aggregations with little to no performance issues, as a good data model is based on the premise that users can take advantage of the aggregation engine behind a BI tool. An example of aggregations in different technologies is shown here:
For SQL, you can have aggregation at the following levels:
- Aggregation at a departmental level:
Select region, organization, department, sum(amount) as sales_amount
From sales_table
Group by region, organization, department
- Aggregation at an organization level:
Select region, organization, sum(amount) as sales_amount
From sales_table
Group by region, organization
- Aggregation at a regional level:
Select region, sum(amount) as sales_amount
From sales_table
Group by region
Consider doing the same operation in Excel, as shown in the following screenshot. You can use the interface to aggregate at different levels and obtain the same result:
Figure 2.4 – Pivot table functionality, as shown in an Excel spreadsheet
If you want to make large and complex datasets more manageable and meaningful, thus enabling better decision-making based on actionable insights, aggregations are the BI technology that will achieve it.
When picking the right tool, you have to make sure you can perform these basic activities:
- Creating metrics or calculations
- Analyzing them by dimensions
- Exploring granularity
- Aggregating data to different levels
Now that we know we can create a report with aggregations, calculations, and attributes from a certain granularity, the report itself should have some level of interaction with a user. This interaction sometimes allows the user to navigate the data in an easier way and, at the same time, get every metric to recalculate for the different scenarios required. We call this drilling down and rolling up.
In BI, drilling down and rolling up are techniques used to navigate through hierarchical data structures and analyze data at different levels of detail.
Drilling down refers to the process of moving from a higher-level summary of data to a lower-level detail. For example, starting with a report that shows total sales by region, drilling down would involve clicking on a specific region to see the sales by country, and then clicking on a specific country to see the sales by city.
Rolling up, on the other hand, refers to the process of moving from a lower-level detail to a higher-level summary. For example, starting with a report that shows sales by city, rolling up would involve aggregating the data to show the sales by region, and then by country.
Drilling down and rolling up are often used together to analyze data at multiple levels of detail. By drilling down to lower levels of detail, analysts can gain insights into the factors that drive overall trends. By rolling up to higher levels of summary, analysts can identify patterns and trends across different regions or segments.
Drilling down and rolling up can be performed manually by analysts using BI tools, or they can be automated through the use of drill-down and roll-up functionality in reporting and analysis tools.
Overall, drilling down and rolling up are important techniques in BI that enable analysts to explore data at different levels of detail, gaining insights that can inform decision-making and drive business performance.
A successful tool should allow us to navigate these data structures up (the aggregation levels) and down (the granularity) with different dimensions and the recalculation of metrics, helped by modeling techniques. One modeling technique a BI tool should allow us to create is known as hierarchies. In BI, a hierarchy is a way of organizing data elements into a logical structure that reflects their relationships and dependencies. Hierarchies are used to represent complex data relationships in a simplified and intuitive way, making it easier for users to navigate and analyze the data.
A hierarchy consists of a series of levels, with each level representing a different category or dimension of data. For example, in a sales analysis, a hierarchy may be defined as follows:
- Level 1: Year
- Level 2: Quarter
- Level 3: Month
- Level 4: Week
- Level 5: Day
Each level in the hierarchy contains a set of members, which represent the values for that level. For example, the members for the month level might include January, February, and March.
Hierarchies can be used to organize data for reporting and analysis and to facilitate drilling down and rolling up through different levels of detail. For example, a user might start by looking at total sales for the year, and then drill down to see the sales by quarter, month, week, and day.
Hierarchies can also be used to define relationships between different data elements. For example, a hierarchy might be defined that relates products to product categories, which in turn are related to product departments.
Hierarchies are really an important concept in BI that enable users to navigate and analyze complex data structures intuitively and efficiently.
As we learn about hierarchies, dimensions, calculations, aggregations, and the act of drilling down and rolling up the granularity level, we can now conclude what things we should be able to do at a minimum with a BI tool. The act of putting all of these together in an ad hoc report is called slicing and dicing. In BI, slice and dice is a technique used to analyze data by selecting a subset of data (slicing) and then examining it from different perspectives (dicing). It allows users to break down data into smaller, more manageable parts and analyze them from different angles to gain deeper insights.
Slicing involves selecting a subset of data based on a specific dimension or category. For example, slicing by time might involve selecting data for a specific month, quarter, or year. Slicing by location might involve selecting data for a specific region, country, or city.
Dicing involves examining the sliced data from different perspectives or dimensions. For example, dicing by product might involve analyzing sales data by product category, brand, or SKU. Dicing by the customer might involve analyzing sales data by demographic, loyalty level, or purchase history.
Together, slicing and dicing allow users to drill down into specific areas of interest and then analyze them in more detail from different perspectives. For example, a user might start by slicing the data by time to look at sales for a specific quarter, and then dice the data by product to look at sales by category, brand, and SKU.
Slice and dice functionality is often built into BI tools and software, allowing users to easily select and analyze data based on different dimensions and categories. It enables users to quickly identify trends, patterns, and outliers in data and make informed decisions, based on the insights gained from the analysis.
This is it – these are the basis for any robust analysis and charts. Trend analysis, forecasting, time series, bar charts, pie charts, scatter plots, correlations, and so on – it all derives from being able to perform such basic operations on top of your dataset. If you are in search of a good BI tool, performing such activities will guarantee you find your BI career path, as they will show you not the end of your roadmap but, instead, spark new ideas and help you understand your data gaps and needs.