Choosing between AWS Glue and Amazon EMR
As you learned about Glue and EMR, you must be wondering, to some extent, whether these offerings are doing a similar job in data processing, so when to choose one over the other? Yes, AWS has a competing offering that can be confusing sometimes, but each has a specific purpose. Amazon always works backward from the customer, so all these offerings are out as the customer asked for them.
There is no brainer for your data cataloging need; you should always use AWS Glue, and these data catalogs can be utilized when you are processing a job in EMR. Glue only supports the Spark framework, and if you are interested in using any other open-source software such as Hive, Ping, or Presto, then you need to choose EMR.
When running data transformation using the Spark platform, you must choose between EMR Vs. Glue. Suppose you are migrating your ETL job from an on-premise Hadoop environment. In that case, you can go with EMR as it will require minimal code...