Designing a data lake architecture
Before cloud platforms, organizations had clusters sitting in their data centers with massive amounts of storage that applications would push their data to for analytics. When storage was running low, the organization would either remove that data on the cluster or increase its storage. Ordering new hardware was costly and was often met with long lead times. As cloud platforms have exploded in popularity, businesses and organizations have leveraged unlimited storage and compute to develop new ways of storing and processing data. One of the most common architectures for data analysis was the data lake. The data lake architecture leverages the unlimited storage that cloud platforms provide and can scale storage and compute independently. It can store an organization's data in a single location, where it can be queried by any user using the best application for the particular use case. Any data that was too large or expensive to store on an on-premises...