The need for a data warehouse
Before we dive deeper into the topics of data warehouses, once again, let’s distinguish between using a data lake versus a data warehouse. Both systems help solve a lot of overlapping use cases and can be used interchangeably for most common use cases. However, there are major differences between them. Essentially, a data lake is a schema-on-read centralized repository that’s flexible enough to store all kinds of structured, semi-structured, and unstructured data at any scale and allows all personas in an organization to derive value from this data easily and cost-effectively. A data warehouse, on the other hand, is a schema-on-write structured repository that stores structured and semi-structured data that’s used for analytics and business intelligence (BI). It excels in data aggregations, slice and dice data operations, roll-up and roll-down data operations, data cubes, and all other OLAP kinds of use cases. Both systems co-exist...