Dimensional modeling
A dimensional model is traditionally seen in OLAP techniques such as data warehouses and data lakes using Apache Spark. The goal of dimensional modeling is to reduce duplication and create a central source of truth. One reason for the reduction of data duplication is to save on storage costs, which isn’t as much of a factor in modern cloud storage. This data model consists of dimensions and facts. The dimension is the entity that we are trying to model in the real world, such as CUSTOMER
, PRODUCT
, DATE
, or LOCATION
, and the fact holds the numerical data such as REVENUE
, PROFIT
, SALES $ VALUE
, and so on. The primary key of the dimensions flows to the fact table as a foreign key but more often than not, it is not hardcoded into the database. Rather, it is managed through the process that manages loading and maintaining data, such as Extract, Transform, and Load (ETL). This data model is business-user-friendly and is used for analytical reporting and analysis...