Limitations of Amazon EMR and possible workarounds
Understanding best practices is very important as that helps to optimize your usage in AWS and will give you the best performance and cost optimization. Apart from best practices, it is also important to understand different limitations the service has so that you can plan for alternate workarounds.
The following are some of the limitations that you should consider while implementing big data workloads in Amazon EMR:
- S3 throughput: When you are writing to or reading from S3, there are a few API limits that you should be aware of. S3 has a limit of 3,500
PUT
/POST
/DELETE
requests per second per prefix in a bucket and 5,500GET
requests per second per prefix in a bucket. These limits are per S3 prefix but there is no limit on how many prefixes you might have. So, as a workaround, you should think of having more prefixes and leverage a partition or sub-partition structure while storing data in S3. As an example, if you have...