Building analytical data stores using cloud data lakes
In this section, you will explore the advantages afforded by cloud-based data lakes for big data analytics systems, and then understand some of the challenges facing big data analytics systems while leveraging cloud-based data analytics systems. You will also write a few PySpark code examples to experience these challenges first-hand.
Challenges with cloud data lakes
Cloud-based data lakes offer unlimited, scalable, and relatively inexpensive data storage. They are offered as managed services by the individual cloud providers and offer availability, scalability, efficiency, and lower total cost of ownership. This helps organizations accelerate their digital innovation and achieve faster time to market. However, cloud data lakes are object storages that evolved primarily to solve the problem of storage scalability. They weren't designed to store highly structured, strongly typed, analytical data. Given this, there are...