In this chapter, we looked at the main configuration parameters of EMR and how they can help us run many big data frameworks, such as Spark, Hive, and Presto. We also explored the AWS services of Athena and Glue as a way to catalog the data on our data lake, so that we can properly synchronize our data pipelines. Finally, we demonstrated how Glue can also be used in EMR, with smooth integration for JupyterHub with SparkMagic.
In the next chapter, Deploying Models Built in AWS, we will cover how to deploy ML models in different environments.