AWS Glue best practices
As we have done with many of the other services covered in the book, we will now provide some recommendations on how to best architect the configuration of your AWS Glue jobs.
Amazon Athena, under the hood, uses the open-source software Presto to process Data Manipulation Language (DML) statements and Apache Hive to process DDL statements. An example of a DML statement is a select
statement, and an example of a DDL statement is a create table
statement.
Similarly, under the hood, AWS Glue runs its ETL jobs using Apache Spark.
Knowing that these are the underlying technologies used under the hood by these AWS services will enable you to better leverage and optimize your use of Amazon Athena and AWS Glue.
Choosing the right worker type
AWS Glue can execute with one of three different worker types. Worker types are also known as Data Processing Units (DPUs).
Each type has different advantages and disadvantages, and they...