Chapter 1: An Overview of Amazon EMR
This chapter will provide an overview of Amazon Elastic MapReduce (EMR), its benefits related to big data processing, and how its cluster is designed compared to on-premises Hadoop clusters. It will then explain how Amazon EMR integrates with other Amazon Web Services (AWS) services and how you can build a Lake House architecture in AWS.
You will then learn the difference between the Amazon EMR, AWS Glue, and AWS Glue DataBrew services. Understanding the difference will make you aware of the options available when deploying Hadoop or Spark workloads in AWS.
Before going into this chapter, it is assumed that you are familiar with Hadoop-based big data processing workloads, have had exposure to AWS basis concepts, and are looking to get an overview of the Amazon EMR service so that you can use it for your big data processing workloads.
The following topics will be covered in this chapter:
- What is Amazon EMR?
- Overview of Amazon EMR
- Decoupling compute and storage
- Integration with other AWS services
- EMR release history
- Comparing Amazon EMR with AWS Glue and AWS Glue DataBrew