Summary
In this chapter, you saw the challenges that are faced by data warehouses and data lakes in designing and implementing large-scale data processing systems that deal with large-scale data. We also looked at the need for businesses to move from advanced analytics to simple descriptive analytics and how the existing systems cannot solve both problems simultaneously. Then, the data lakehouse paradigm was introduced, which solves the challenges of both data warehouses and data lakes and how it bridges the gap of both systems by combining the best elements from both. The reference architecture for data lakehouses was presented and a few data lakehouse candidates were presented from existing commercially available, large-scale data processing systems, along with their drawbacks. Next, an Apache Spark-based data lakehouse architecture was presented that made use of the Delta Lake and cloud-based data lakes. Finally, some advantages of data lakehouses were presented, along with a few...