A data warehouse is a repository of enterprise data used for reporting and analysis. There have been three waves of data warehouses so far, which we will cover in the upcoming subsections.
The need for a data warehouse
Driven by IT
This is the first wave of business intelligence (BI). IT needed to separate operational data and databases from its origin for the following reasons:
- Keep data changes history. Some operational applications purge the data after a while.
- When users wanted to report on the application's data, they were often affecting the performance of the system. IT replicated the operational data to another server to avoid any performance impact on applications.
- Things got more complex when users wanted to do analysis and reports on databases from multiple enterprise's applications. IT had to replicate all the needed systems and make them speak together. This implied that new structures had to be built and new patterns emerged from there: star schemas, decision support systems (DSS), OLAP cubes, and so on.
Self-service BI
Analysts and users always need data warehouses to evolve at a faster pace. This is the second wave of BI and it happened when major BI players such as Microsoft and Click came with tools that enabled users to merge some data with or without data warehouses. In many enterprises, this is used as a temporary source of analytics or proof of concept. On the other hand, not every data could fit at that time in data warehouses. Many ad hoc reports were, and are still, using self-service BI tools. Here is a short list of such tools:
- Microsoft Power Pivot
- Microsoft Power BI
- Click
Cloud-based BI – big data and artificial intelligence
This is the third wave of BI. The cloud capabilities enable enterprises to do more accurate analysis. Big data technologies allows users to base their analysis on much bigger data volumes. This helps them deriving patterns form the data and have technologies that incorporate and modify these patterns. This leads to artificial intelligence or AI.
Technologies used in big data are not that new. They were used by many search engines in the early 21st century such as Yahoo! and Google. They have also been used quite a lot in research faculties in different enterprises. The third wave of BI broaden the usage of these technologies. Vendors such as Microsoft, Amazon, or Google make it available to almost everyone with their cloud offer.]