Tuning AWS Glue workloads
Based on our discussions in the previous chapter, we already know that AWS Glue is a serverless data integration service wherein different components are bundled with a number of optimizations that cover most use cases—most being the operative word here. The optimizations already in place may not be the perfect fit for our use case, and they can be further improved to get the most out of the resources we are allocating.
It is still up to us to monitor workloads and implement optimizations where necessary to ensure that we are making use of resources efficiently. The performance of any Glue component is dependent on a number of factors such as input data, resources allocated, configuration, and the actual workflow itself.
Now, let’s discuss some of the tuning mechanisms we can use to optimize different components of AWS Glue.
Tuning AWS Glue crawlers
As discussed in the previous section, the performance of a Glue component depends...