Designing a partition strategy for efficiency/performance
In the last few sections, we explored the various storage and analytical partitioning options and learned about how partitioning helps with performance, scale, security, availability, and so on. In this section, we will recap the points we learned about performance and efficiency and learn about some additional performance patterns.
Here are some strategies to keep in mind while designing for efficiency and performance:
- Partition datasets into smaller chunks that can be run with optimal parallelism for multiple queries.
- Partition the data such that queries don't end up requiring too much data from other partitions. Minimize cross-partition data transfers.
- Design effective folder structures to improve the efficiency of data reads and writes.
- Partition data such that a significant amount of data can be pruned while running queries.
- Partition in units of data that can be easily added, deleted, swapped...