Extending analytics with data warehouses/data marts
Tools such as Amazon Athena (which we will do a deeper dive into in Chapter 11, Ad Hoc Queries with Amazon Athena) allow us to run SQL queries directly on data in the data lake. And while this enables us to query very large datasets that exist on Amazon S3, the performance of these queries is generally lower than the performance you get when running queries against data on a high-performance disk that is local to the compute engine.
Not all queries require this kind of high performance, and we can categorize our queries and data into three categories. Let's take a look.
Cold data
This is data that is not frequently accessed, but it is mandatory to store it for long periods for compliance and governance reasons, or historical data that is stored to enable future research and development (such as for training machine learning models).
An example of this is the logs from a banking website. Unless there is a breach...