Data lake design for big data
In this section, we will contrast data lakes with traditional data warehouses and cover core design patterns. This will set the stage for the hands-on tools and implementation coverage in the final “How to” section. Let’s start with the baseline for the modern data architecture: the data warehouse.
Data warehouses
Data warehouses have been the backbone of business intelligence and analytics for decades. A data warehouse is a repository of integrated data from multiple sources, organized and optimized for reporting and analysis.
The key aspects of the traditional data warehouse architecture are as follows:
- Structured data: Data warehouses typically only store structured data such as transaction data from databases and CRM systems. Unstructured data from documents, images, social media, and so on are not included.
- Schema-on-write: The data structure and schema are defined upfront during data warehouse design. This...