Optimizing Amazon Athena
As with any SQL operation, there are steps you can take to optimize the performance of your queries and inserts. As is the case with traditional databases, optimizing your data access performance usually comes at the expense of data ingestion and vice versa. Let's look at some tips that you can use to increase and optimize performance.
Optimization of data partitions
One way to improve performance is to break up files into smaller files called partitions. A common partition scheme is to break up a file by using a divider that occurs with some regularity in data. Some examples follow:
- Country
- Region
- Date
- Product
Partitions operate as virtual columns and assist in reducing the amount of data that needs to be read for each query. Partitions are normally defined at the time a table or file is created.
Amazon Athena can use Apache Hive partitions. Hive partitions use this name convention:
s3://BucketName/TablePath/<...