Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Simplify Big Data Analytics with Amazon EMR

You're reading from   Simplify Big Data Analytics with Amazon EMR A beginner's guide to learning and implementing Amazon EMR for building data analytics solutions

Arrow left icon
Product type Paperback
Published in Mar 2022
Publisher Packt
ISBN-13 9781801071079
Length 430 pages
Edition 1st Edition
Concepts
Arrow right icon
Author (1):
Arrow left icon
Sakti Mishra Sakti Mishra
Author Profile Icon Sakti Mishra
Sakti Mishra
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Section 1: Overview, Architecture, Big Data Applications, and Common Use Cases of Amazon EMR
2. Chapter 1: An Overview of Amazon EMR FREE CHAPTER 3. Chapter 2: Exploring the Architecture and Deployment Options 4. Chapter 3: Common Use Cases and Architecture Patterns 5. Chapter 4: Big Data Applications and Notebooks Available in Amazon EMR 6. Section 2: Configuration, Scaling, Data Security, and Governance
7. Chapter 5: Setting Up and Configuring EMR Clusters 8. Chapter 6: Monitoring, Scaling, and High Availability 9. Chapter 7: Understanding Security in Amazon EMR 10. Chapter 8: Understanding Data Governance in Amazon EMR 11. Section 3: Implementing Common Use Cases and Best Practices
12. Chapter 9: Implementing Batch ETL Pipeline with Amazon EMR and Apache Spark 13. Chapter 10: Implementing Real-Time Streaming with Amazon EMR and Spark Streaming 14. Chapter 11: Implementing UPSERT on S3 Data Lake with Apache Spark and Apache Hudi 15. Chapter 12: Orchestrating Amazon EMR Jobs with AWS Step Functions and Apache Airflow/MWAA 16. Chapter 13: Migrating On-Premises Hadoop Workloads to Amazon EMR 17. Chapter 14: Best Practices and Cost-Optimization Techniques 18. Other Books You May Enjoy

EMR release history

As Amazon EMR is built on top of the open source Hadoop ecosystem, it tries to stay up to date with the open source stable releases, which includes new features and bug fixes.

Each EMR release comprises different Hadoop ecosystem applications or services that fit together with specific versions. EMR uses Apache Bigtop, which is an open source project within the Apache community to package the Hadoop ecosystem applications or components for an EMR release.

When you launch a cluster, you need to select the EMR cluster version and with advanced options, you can identify which version of each Hadoop application is integrated into that EMR release. If you are using AWS SDK or AWS CLI commands to create a cluster, you can specify the version using the release label. Release labels follow a naming convention of emr-x.x.x, for example, emr-6.3.0.

The EMR documentation clearly lists each release version and the Hadoop components integrated into it.

The following is a diagram of the EMR 6.3.0 release, which lists a few components of Hadoop services that are integrated into it and how it compares to previous releases of EMR 6.x:

Figure 1.7 – Diagram of EMR release version comparison

Figure 1.7 – Diagram of EMR release version comparison

If you were using open source Hadoop or any third-party Hadoop clusters and then migrating to EMR, it is best to go through the release documentation, understand different versions of Hadoop applications integrated into it, find the different configurations involved related to security, network access, authentication, authorization, and so on, and then evaluate it against your current Hadoop cluster to plan for migration.

With this, you have got a good overview of Amazon EMR, its benefits, its release history, and more. Now, let's compare it with a few other AWS services that are also based on Spark workloads and understand how they compare with Amazon EMR.

You have been reading a chapter from
Simplify Big Data Analytics with Amazon EMR
Published in: Mar 2022
Publisher: Packt
ISBN-13: 9781801071079
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime