Why choose Amazon S3 as a data lake store?
Before we dive deep into the actual data and analytics use cases and explore how to design data lakes on AWS, it is first important to understand why Amazon Simple Storage Service (Amazon S3) is the preferred choice for building a data lake and why it is used as a storage layer to store all kinds of data in a centralized location.
If you recall from the discussions we had in Chapter 1, an ideal storage for building a data lake should inherently be scalable, durable, highly performant, easy to use, secure, cost-effective, and integrated with other building blocks of the data lake ecosystem. So, we ask a very important question: why choose Amazon S3 as a data lake store?
S3 checks all the boxes on what we look for in a store for building data lakes. Here are some of the features of S3:
- Scalable: S3 is a petabyte-scale object store with virtually unlimited storage
- Durable: S3 is designed for 99.999999999% (11 9s) of data durability...