Discovering when a data lake does not suffice
Data lakes were supposed to fix all the deficiencies of warehouses, but did they? Let's find out. The best contribution of data lakes was to truly unify all kinds of data in an open format at all velocities, and this includes real-time ingestion and analytics. Yes, a big check mark for this one. We can now have all kinds of unstructured data right alongside the structured ones, making them true first-class citizens. Likewise, we can enable ML on this data along with having the ability to use high-level languages to grapple with data using open APIs. Also, another big advantage is that data remained in an open format, allowing all tools of the ecosystem to leverage it effectively from its single source of truth. No more vendor lock-in!
The thing that they had to give up was effective BI, which warehouses were so good at. Exploratory Data Analytics (EDA), which did not have such stringent SLAs, was enabled, but core BI reporting...