The nature of data warehouses and data lakes
A data warehouse (DW or DWH) is a central repository of current and historical data that's been integrated from one or more disparate sources. The DWH (also referred to as enterprise data warehouse (EDW)) is a system that's used for data analysis and reporting. It is usually considered the core of an enterprise business intelligence strategy.
Data stored in a DWH comes from multiple systems, including operational systems (such as CRM systems). The data may need to undergo a set of data cleansing activities before it can be uploaded into the DWH to ensure data quality.
Some DWH tools have built-in ETL capabilities, while others rely on external third-party tools (we will cover ETL tools and other integration middleware in Chapter 3, Core Architectural Concepts – Integration and Security). This ETL capability will ensure that the ingested data has a specific quality and structure. Data might be staged in a specific staging...