Extending analytics with data warehouses/data marts
Tools such as Amazon Athena (which we will do a deeper dive into in Chapter 11, Ad Hoc Queries with Amazon Athena) allow us to run SQL queries directly on data in the data lake. While this enables us to query very large datasets that exist in an Amazon S3 data lake, the performance of these queries is generally lower than the performance you get when running queries against data on a high-performance disk that is local to the compute engine.
However, not all queries require this kind of high performance, and we can categorize our queries and data into cold, warm, and hot tiers. Before diving into the topic of data marts and data warehouses, let’s first take a look at the different tiers of queries/data storage that are common in data lake projects.
Cold and warm data
We’ve grouped the cold and warm data tiers into one section, as when building in AWS, both of these tiers generally use Amazon S3 storage...