Summary
In this chapter, we learned how to design the data layout to accelerate our analytic workloads. In particular, we learned about it by focusing on three parts, including how we store our data optimally, how we manage the number of files and each file size, and how we optimize our storage by working with Amazon S3.
In the first part, we learned techniques to store our data optimally. These techniques include choosing file formats and compression types, understanding file splitability, and partitioning/bucketing. Then, we learned about data compaction to manage the number of files and each file size and to enhance analytic query performance. In the last part, we learned how to optimize our storage with Amazon S3 and Glue DynamicFrames. You can effectively use your storage by archiving, expiring, and deleting your data with Amazon S3 Lifecycle configurations and the Glue DynamicFrame methods.
Managing the data in your data lake with techniques introduced in this chapter...