Partition management
In the previous sections, we discussed how to automatically update and add partitions to tables. This means that with an easy setup, Glue is capable of adding partitions continuously as your dataset grows.
For very large data lakes, however, this setup can easily run into issues. Glue supports up to 10 million partitions per table by default; however, having such a large number of partitions will increasingly lower your query execution times without proper management.
Partition indexes
Let’s take the example of a table storing product sales information. The table is partitioned by product category, and even though the business started small and we had only a handful of categories, as we expanded and added external sellers, we are now in the tens of thousands of categories.
Our business analysts want to query data based on product families, and so their Glue ETL queries usually include a WHERE CATEGORY=
clause, filtering by category. Every time...