Designing an Azure data lake
If you have been following the big data technologies domain, you would have definitely come across the term data lake. Data lakes are distributed data stores that can hold very large volumes of diverse data. They can be used to store different types of data such as structured, semi-structured, unstructured, streaming data, and so on.
A data lake solution usually comprises a storage layer, a compute layer, and a serving layer. The compute layers could include Extract, Transform, Load (ETL); Batch; or Stream processing. There are no fixed templates for creating data lakes. Every data lake could be unique and optimized as per the owning organization's requirements. However, there are few general guidelines available to build effective data lakes, and we will be learning about them in this chapter.
How is a data lake different from a data warehouse?
The main difference between a data lake and a data warehouse is that a data warehouse stores structured...